Models
Models are the AI brains behind your agents. docker-agent supports multiple providers and flexible configuration.
Inline vs. Named Models
There are two ways to assign a model to an agent:
Inline (Quick)
Use the provider/model shorthand directly in the agent definition:
agents:
root:
model: openai/gpt-5
instruction: You are a helpful assistant.Named (Full Control)
Define models in a models section and reference them by name:
models:
claude:
provider: anthropic
model: claude-sonnet-4-5
max_tokens: 64000
temperature: 0.7
agents:
root:
model: claude
instruction: You are a helpful assistant.Named models let you configure temperature, token limits, thinking budgets, and other parameters. They're also reusable across multiple agents.
First Available Models
A named model can also select the first usable model from a priority list. This is useful for shared configs that should prefer paid cloud models when their API keys are present, but still work with a local fallback:
models:
smart:
first_available:
- anthropic/claude-sonnet-4-5
- openai/gpt-5
- dmr/ai/qwen3
agents:
root:
model: smart
instruction: You are a helpful assistant.At load time, docker-agent selects the first candidate whose credentials are configured. You only need credentials for one candidate. See Model Configuration for details.
Supported Providers
| Provider | Key | Example Models | API Key Env Var |
|---|---|---|---|
| OpenAI | openai | gpt-5, gpt-5-mini, gpt-4o | OPENAI_API_KEY |
| Anthropic | anthropic | claude-sonnet-4-5, claude-opus-4-7 | ANTHROPIC_API_KEY |
google | gemini-3.5-flash, gemini-3-pro | GOOGLE_API_KEY / GEMINI_API_KEY | |
| AWS Bedrock | amazon-bedrock | Claude, Nova, Llama models | AWS credentials |
| Docker Model Runner | dmr | ai/qwen3, ai/llama3.2 | None (local) |
| Mistral | mistral | Mistral models | MISTRAL_API_KEY |
| xAI | xai | Grok models | XAI_API_KEY |
| Nebius | nebius | Open-source and specialised models | NEBIUS_API_KEY |
| MiniMax | minimax | MiniMax models | MINIMAX_API_KEY |
| Baseten | baseten | DeepSeek, Kimi, GLM, Llama models | BASETEN_API_KEY |
| OVHcloud | ovhcloud | Qwen, Llama, Mistral, DeepSeek (EU-hosted) | OVH_AI_ENDPOINTS_ACCESS_TOKEN |
| Groq | groq | Llama, Qwen, GPT-OSS (fast inference) | GROQ_API_KEY |
| Fireworks AI | fireworks | Kimi, Llama, Qwen, DeepSeek, GLM (open models) | FIREWORKS_API_KEY |
| DeepSeek | deepseek | DeepSeek-V3 chat and R1 reasoner | DEEPSEEK_API_KEY |
| Cerebras | cerebras | GPT-OSS, GLM (fast inference) | CEREBRAS_API_KEY |
| Together AI | together | Llama, Qwen, DeepSeek, Kimi (open models) | TOGETHER_API_KEY |
| Hugging Face | huggingface | Llama, Qwen, DeepSeek, GLM (open models) | HF_TOKEN |
| Cloudflare Workers AI | cloudflare-workers-ai | Llama, Mistral, Qwen, Gemma (edge-hosted open models) | CLOUDFLARE_API_TOKEN + CLOUDFLARE_ACCOUNT_ID |
| Moonshot AI | moonshot | Kimi K2 chat, reasoning, and coding models | MOONSHOT_API_KEY |
| Vercel AI Gateway | vercel | Multi-provider gateway | AI_GATEWAY_API_KEY |
| Cloudflare AI Gateway | cloudflare-ai-gateway | Multi-provider gateway | CLOUDFLARE_API_TOKEN + CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_GATEWAY_ID |
| Requesty | requesty | Multi-provider gateway | REQUESTY_API_KEY |
| OpenRouter | openrouter | Multi-provider gateway | OPENROUTER_API_KEY |
| Azure OpenAI | azure | gpt-4o, gpt-5 on Azure | AZURE_API_KEY + base_url |
| Ollama | ollama | Any local Ollama model | None (local; optional base_url) |
| GitHub Copilot | github-copilot | Copilot-hosted OpenAI/Anthropic | GITHUB_TOKEN (PAT with copilot) |
See the Model Providers section for detailed configuration guides.
Model Properties
| Property | Type | Description |
|---|---|---|
provider | string | Provider identifier (required) |
model | string | Model name (required) |
temperature | float | Randomness: 0.0 (deterministic) to 1.0 (creative) |
max_tokens | int | Maximum response length |
top_p | float | Nucleus sampling: 0.0 to 1.0 |
frequency_penalty | float | Reduce repetition: 0.0 to 2.0 |
presence_penalty | float | Encourage topic diversity: 0.0 to 2.0 |
base_url | string | Custom API endpoint |
thinking_budget | string/int | Reasoning effort configuration |
task_budget | int/object | Total token budget for an agentic task (Anthropic; honored by Opus 4.7 today) |
provider_opts | object | Provider-specific options |
Reasoning / Thinking Budget
Control how much the model "thinks" before responding:
| Provider | Format | Values | Default |
|---|---|---|---|
| OpenAI | string | minimal, low, medium, high, xhigh | medium (always-reasoning models only) |
| Anthropic | int or str | 1024–32768 tokens, or adaptive, adaptive/<effort>, effort level | off |
| Gemini 2.5 | int | 0 (off), -1 (dynamic), or token count | -1 (dynamic) |
| Gemini 3 | string | minimal, low, medium, high | varies |
| All | string/int | none or 0 to disable | — |
models:
deep-thinker:
provider: anthropic
model: claude-sonnet-4-5
thinking_budget: 16384
fast-responder:
provider: openai
model: gpt-5
thinking_budget: none # disable thinkingNoteMulti-provider teams
Different agents can use different providers in the same config. See Multi-Agent for patterns.
Alloy Models
"Alloy models" let you use more than one model in the same conversation — docker-agent alternates between them to leverage the strengths of each:
agents:
root:
model: anthropic/claude-sonnet-4-5,openai/gpt-5
instruction: You are a helpful assistant.Read more about the alloy model concept at xbow.com/blog/alloy-agents.