LLM Thinking Token Budgets
Thinking Token Budget Token budget parameters for thinking LLMs usually cap how many internal reasoning tokens the model may spend before producing the visible answer. Common names by API/provider include: max_tokens / max_output_tokens: caps generated output tokens, sometimes including hidden reasoning tokens depending on the API. reasoning_effort: qualitative budget like low, medium, high; the API maps this to an internal reasoning-token allowance. thinking_budget / budget_tokens: explicit number of hidden reasoning tokens allowed for models that expose thinking controls. max_completion_tokens: in some APIs, caps both reasoning tokens and final answer tokens together. Why it matters: ...