Thinking / Reasoning

Control how much a model reasons before responding. Works across OpenAI, Anthropic, Google Gemini, AWS Bedrock, and Docker Model Runner.

What Is Thinking?

Several modern models support an extended reasoning phase that happens before they produce visible output. During this phase the model plans, evaluates options, and works through the problem — internally, not shown in the response by default. This typically improves accuracy on complex tasks like coding, math, and multi-step planning, at the cost of higher token usage and latency.

docker-agent exposes this through a single thinking_budget field on any named model. The value format differs slightly by provider, but the semantics are the same: higher effort means more thorough reasoning.

Note
Think tool vs. thinking budget
The think tool is a scratchpad for models that lack native reasoning. If your model supports thinking_budget, you do not need the think tool.

Quick Reference

Provider	Format	Values	Default
OpenAI	string	`none`, `minimal`, `low`, `medium`, `high`, `xhigh`, `max`; `xhigh` on gpt-5.2+, `none`/`max` on gpt-5.6+ only, `minimal` dropped on gpt-5.6+	`medium` (API default)
Anthropic	int or str	1024–32768 tokens, or `minimal`–`max`, `adaptive`, `adaptive/<effort>`, `none`	off
Gemini 2.5	int	`0` (off), `-1` (dynamic), or token count (max 24576 / 32768)	`-1` (dynamic)
Gemini 3	string	`minimal`, `low`, `medium`, `high`	API default (model-dependent)
AWS Bedrock	int or str	1024–32768 tokens (`minimal`–`max` mapped to tokens); `adaptive`, `adaptive/<effort>` for Opus 4.6+ (rejected by older Claude models)	off
Docker Model Runner	int or str	token count, `minimal`–`max` (mapped to tokens), `adaptive` (unlimited), `none`	engine default

String values are case-insensitive. The full set of accepted strings is none, minimal, low, medium, high, xhigh, max, adaptive, and adaptive/<effort> — but each provider only honors the subset listed above. Unsupported values either fail at request time (OpenAI) or are mapped/ignored as described per provider below.

thinking_budget is only applied by the providers listed above. Other OpenAI-compatible providers (xAI, Mistral, Ollama, …) currently ignore it — see xAI and Mistral.

OpenAI

OpenAI reasoning models (o-series, gpt-5, gpt-5-mini, gpt-5.6 family) use a string effort level that maps to their reasoning_effort API parameter. The xhigh level requires gpt-5.2+; none and max require gpt-5.6+ (Sol/Terra/Luna); minimal is dropped on gpt-5.6+.

models:
  gpt-thinker:
    provider: openai
    model: gpt-5.6
    thinking_budget: high   # none | minimal | low | medium | high | xhigh | max

Effort levels:

Level	Description
`none`	No reasoning. Sent as-is on gpt-5.6+ (a real API value); on older models it just disables the local `thinking_budget` (the API's own default still applies).
`minimal`	Fastest; lightest reasoning pass. Not accepted on gpt-5.6+ (dropped from the API).
`low`	Quick reasoning for straightforward tasks.
`medium`	Balanced default.
`high`	More thorough; recommended for complex tasks.
`xhigh`	Near-maximum effort; slower but most accurate. Requires gpt-5.2+.
`max`	Maximum effort. Requires gpt-5.6+.

Token counts, adaptive, and adaptive/<effort> are rejected with a configuration error at request time. xhigh is only supported by gpt-5.2 and later minor versions (e.g. gpt-5.2, gpt-5.4-mini); none and max are only supported by gpt-5.6 and later (Sol/Terra/Luna); minimal is not accepted on gpt-5.6+. Older models (o1, o3-mini) only accept low/medium/high — sending an unsupported level returns an API error.

Warning
Tokens and max_tokens
Older OpenAI reasoning models always reason internally — even with thinking_budget: none there are hidden reasoning tokens that count against max_tokens. On gpt-5.6+ (Sol/Terra/Luna), none is a real API value that genuinely disables reasoning. docker-agent automatically raises the output-token floor for its internal low-effort calls (e.g. title generation) so hidden reasoning cannot starve visible text output.

Anthropic

Anthropic Claude supports two thinking modes: a token budget (older models) and adaptive / effort-based thinking (newer models).

Token budget (Claude 4 and earlier)

Set an explicit number of thinking tokens (1024–32768). This must be less than max_tokens:

models:
  claude-thinker:
    provider: anthropic
    model: claude-sonnet-4-5
    thinking_budget: 16384   # tokens reserved for internal reasoning

docker-agent auto-adjusts max_tokens when you set a thinking budget but leave max_tokens at its default. If you set max_tokens explicitly, it must be greater than thinking_budget.

Adaptive thinking (Opus 4.6+ and Sonnet 4.6)

Newer Claude models support adaptive thinking, where the model decides how much to think. Claude Opus 4.6, 4.7, 4.8, and Sonnet 4.6 only support adaptive thinking — they reject token-based budgets. Use adaptive, adaptive/<effort>, or a bare effort level — on Anthropic, a bare effort level like high is shorthand for adaptive thinking at that effort:

models:
  claude-adaptive:
    provider: anthropic
    model: claude-opus-4-6
    thinking_budget: adaptive          # model decides effort (defaults to high)

  claude-adaptive-low:
    provider: anthropic
    model: claude-opus-4-6
    thinking_budget: low               # same as adaptive/low

  claude-adaptive-max:
    provider: anthropic
    model: claude-opus-4-6
    thinking_budget: adaptive/max      # adaptive/low | adaptive/medium | adaptive/high | adaptive/xhigh | adaptive/max

Adaptive effort levels and per-model support:

Level	Opus 4.5	Sonnet 4.5 / Haiku	Sonnet 4.6	Opus 4.6	Opus 4.7 / 4.8	Fable 5	Mythos 5	Mythos preview
`low`	✓	✓	✓	✓	✓	✓	✓	✓
`medium`	✓	✓	✓	✓	✓	✓	✓	✓
`high`	✓	✓	✓	✓	✓	✓	✓	✓
`xhigh`	—	—	—	—	✓	✓	✓	—
`max`	—	—	✓	✓	✓	✓	✓	✓

minimal is treated as low (bare form only). high is the default when adaptive is used without an effort level.

Warning
Effort strings require adaptive-capable models
Every string effort value on Anthropic is sent as adaptive thinking (output_config.effort), which only newer Claude models (Opus 4.6+, Sonnet 4.6) accept. For older models like Sonnet 4.5, use an integer token budget instead. Conversely, models that only support adaptive thinking (Opus 4.6, 4.7, 4.8, Sonnet 4.6) automatically have token budgets coerced to adaptive (a warning is logged).

Disabling thinking

thinking_budget: none   # or 0

Interleaved thinking

Interleaved thinking lets the model reason between tool calls — useful for complex agentic tasks. docker-agent auto-enables it whenever a thinking budget is configured on a Claude model, so you only need to set it explicitly to turn it off:

models:
  claude-interleaved:
    provider: anthropic
    model: claude-sonnet-4-5
    thinking_budget: 16384
    # interleaved_thinking is auto-enabled; disable it explicitly if needed:
    provider_opts:
      interleaved_thinking: false

Note
Temperature and top_p
When extended thinking is enabled, Anthropic requires temperature=1.0. docker-agent automatically suppresses any temperature or top_p settings you have configured — they are silently ignored while thinking is active.

Thinking display

Newer Claude models (Opus 4.7+, Fable 5) hide thinking content by default at the API level. To keep reasoning visible, docker-agent requests summarized thinking whenever adaptive/effort-based thinking is used without an explicit thinking_display. Use thinking_display in provider_opts to override:

models:
  opus-47:
    provider: anthropic
    model: claude-opus-4-7
    thinking_budget: adaptive
    provider_opts:
      thinking_display: omitted   # summarized | display | omitted

Value	Behavior
`summarized`	Thinking blocks returned with a text summary (docker-agent default for adaptive thinking).
`display`	Full thinking blocks returned for display.
`omitted`	Thinking blocks hidden — only the signature is returned.

Full thinking tokens are billed regardless of thinking_display.

Task budget (Anthropic)

task_budget caps total tokens across an entire multi-step agentic task (thinking + tool calls + output combined):

models:
  opus-bounded:
    provider: anthropic
    model: claude-opus-4-7
    thinking_budget: adaptive
    task_budget: 128000   # total token ceiling for the whole task

See the Anthropic provider page for details.

Google Gemini

Gemini 2.5 and Gemini 3 use different formats.

Gemini 2.5 (token budget)

models:
  gemini-off:
    provider: google
    model: gemini-2.5-flash
    thinking_budget: 0      # disable thinking

  gemini-dynamic:
    provider: google
    model: gemini-2.5-flash
    thinking_budget: -1     # let the model decide (default)

  gemini-fixed:
    provider: google
    model: gemini-2.5-flash
    thinking_budget: 8192   # fixed token budget (max 24576 for Flash, 32768 for Pro)

Gemini 3 (level-based)

models:
  gemini3-flash:
    provider: google
    model: gemini-3-flash
    thinking_budget: medium   # minimal | low | medium | high

  gemini3-pro:
    provider: google
    model: gemini-3-pro
    thinking_budget: high     # low | high (Pro supports fewer levels)

AWS Bedrock (Claude)

Bedrock Claude uses a token budget like Anthropic. String effort levels (minimal–max) are mapped automatically:

Effort level	Token budget
`minimal`	1,024
`low`	2,048
`medium`	8,192
`high`	16,384
`xhigh`/`max`	32,768

models:
  bedrock-claude-thinker:
    provider: amazon-bedrock
    model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
    thinking_budget: 8192   # or use an effort level: medium
    provider_opts:
      region: us-east-1

  bedrock-claude-interleaved:
    provider: amazon-bedrock
    model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
    thinking_budget: high
    provider_opts:
      region: us-east-1
      # interleaved_thinking is auto-enabled when thinking_budget is set

Claude Opus 4.6+ on Bedrock requires adaptive thinking — these models reject thinking.type=enabled (token budgets). Configure them with adaptive or adaptive/<effort>; docker-agent auto-coerces token budgets and effort levels on these models with a warning:

models:
  bedrock-opus-adaptive:
    provider: amazon-bedrock
    model: global.anthropic.claude-opus-4-8
    thinking_budget: adaptive/high
    provider_opts:
      region: us-east-1

Warning
Bedrock thinking requirements
Bedrock Claude requires token-based thinking_budget values to be ≥ 1024 and less than max_tokens. docker-agent logs a warning and ignores the budget if either condition is violated. Interleaved thinking requires the interleaved-thinking-2025-05-14 beta header, which docker-agent adds automatically; it is auto-enabled whenever a token thinking budget is set on a Bedrock-hosted Claude model (adaptive thinking interleaves on its own).

Docker Model Runner (local models)

For local models, thinking_budget is forwarded to the inference engine. Both token counts and effort strings work; effort strings map to the same token scale as Bedrock (minimal=1024 … xhigh/max=32768), and adaptive means unlimited:

models:
  local:
    provider: dmr
    model: ai/qwen3
    thinking_budget: medium   # llama.cpp: reasoning-budget=8192; vLLM: thinking_token_budget=8192

llama.cpp: sent as reasoning-budget at model-configure time.
vLLM: sent as thinking_token_budget on each request.
MLX / SGLang: no reasoning-budget knob; the value is silently ignored.

See the Docker Model Runner provider page for details.

xAI (Grok) and Mistral

xAI and Mistral run through docker-agent's OpenAI-compatible client, but the reasoning_effort parameter is only sent for OpenAI reasoning model names (o-series, gpt-5). Setting thinking_budget on Grok or Mistral models currently has no effect — the value is accepted by config validation but never sent to the API.

Grok and Mistral reasoning models (e.g. grok-3-mini, magistral) manage reasoning on their own; for non-reasoning models, consider the think tool instead.

Disabling Thinking

Use none or 0 to disable thinking on any provider:

models:
  fast-model:
    provider: openai
    model: gpt-5-mini
    thinking_budget: none

  gemini-no-think:
    provider: google
    model: gemini-2.5-flash
    thinking_budget: 0

none and 0 clear docker-agent's thinking configuration — no thinking parameter is sent. Models that always reason (OpenAI o-series, gpt-5 through gpt-5.5, Gemini 3) then fall back to the API's default behavior and still reason internally; gpt-5.6+ (Sol/Terra/Luna) sends none as a real API value that genuinely disables reasoning. Models with optional thinking (Gemini 2.5, Claude, local models) are also fully disabled.

Choosing an Effort Level

Task complexity	Recommended level
Simple factual Q&A	`none` / `minimal`
General-purpose chat	`low` / `medium`
Coding, debugging, analysis	`medium` / `high`
Complex reasoning, planning	`high` / `xhigh`
Research, difficult math/logic	`xhigh` / `max`
Long agentic tasks (Anthropic)	`adaptive`

Changing Thinking Level at Runtime

While running in the TUI, press Shift+Tab to cycle the thinking effort level for the current model without editing your YAML config, or type /effort <level> to jump straight to a specific level (e.g. /effort high). Running /effort without an argument opens a picker listing the levels the current model supports:

The level steps through the model's supported range (model-specific), wrapping around — for example none → minimal → low → medium → high → none on OpenAI gpt-5/o-series, none → minimal → low → medium → high → xhigh → none on gpt-5.2+, none → low → medium → high → xhigh → max → none on gpt-5.6+ (no minimal), none → low → medium → high → max → none on Anthropic Opus 4.6 and Sonnet 4.6, and none → low → medium → high → xhigh → max → none on Anthropic Opus 4.7+, Fable 5, and Mythos 5. For older Anthropic models (e.g. Sonnet 4.5) that only accept token budgets, effort-string cycling has no effect — use an integer thinking_budget in your YAML config instead.
The current level is shown in the sidebar next to the model name (e.g. openai/gpt-5 • high).
This applies as a session override — it is not saved to the config file. The next session starts from the level defined in your YAML.
For models that don't support reasoning, and for remote runtimes, Shift+Tab is a no-op and an informational message is displayed.
/effort only accepts levels the current model supports; requesting an unsupported level shows the model's supported list. Like Shift+Tab, it is unavailable for non-reasoning models and remote runtimes.

Define a provider with a default thinking_budget and all models that reference it inherit it:

providers:
  deep-anthropic:
    provider: anthropic
    thinking_budget: adaptive/high
    max_tokens: 32768

models:
  claude-smart:
    provider: deep-anthropic
    model: claude-opus-4-6     # inherits thinking_budget: adaptive/high

  claude-faster:
    provider: deep-anthropic
    model: claude-opus-4-6
    thinking_budget: low       # overrides to adaptive/low

Full Example

See examples/thinking_budget.yaml for a runnable config covering all providers.

What can I help you with?

Thinking / Reasoning