LLM Thinking Token Budgets

Thinking Token Budget Token budget parameters for thinking LLMs usually cap how many internal reasoning tokens the model may spend before producing the visible answer. Common names by API/provider include: max_tokens / max_output_tokens: caps generated output tokens, sometimes including hidden reasoning tokens depending on the API. reasoning_effort: qualitative budget like low, medium, high; the API maps this to an internal reasoning-token allowance. thinking_budget / budget_tokens: explicit number of hidden reasoning tokens allowed for models that expose thinking controls. max_completion_tokens: in some APIs, caps both reasoning tokens and final answer tokens together. Why it matters: ...

May 25, 2026 · 1 min

Chain of Thought (CoT)

Chain of Thought (CoT) Chain of Thought is a prompting technique where an AI model is guided — or learns — to reason through a problem step by step before arriving at a final answer, rather than jumping straight to the conclusion. The core idea is that breaking down complex reasoning into intermediate steps leads to more accurate and reliable outputs, much like how a person might work through a math problem by showing their work. ...

April 23, 2026 · 2 min

Visual Chain-of-Thought Reasoning

Visual Chain-of-Thought Reasoning Visual chain-of-thought (CoT) reasoning is the extension of standard chain-of-thought prompting to multimodal settings — where the model reasons step-by-step over both visual and textual information together. Core Idea In standard CoT, a language model breaks a problem into intermediate reasoning steps before arriving at a final answer. Visual CoT does the same, but the reasoning chain involves interpreting, referencing, and drawing inferences from images, diagrams, charts, or visual scenes alongside text. ...

April 22, 2026 · 2 min