Local models with Docker Model Runner
Docker Model Runner lets you run AI models locally on your machine. No API keys, no recurring costs, and your data stays private.
Why use local models
Docker Model Runner lets you run models locally without API keys or recurring costs. Your data stays on your machine, and you can work offline once models are downloaded. This is an alternative to cloud model providers.
Prerequisites
You need Docker Model Runner installed and running:
- Docker Desktop (macOS/Windows) - Enable Docker Model Runner in Settings > AI > Enable Docker Model Runner. See Get started with DMR for detailed instructions.
- Docker Engine (Linux) - Install with
sudo apt-get install docker-model-pluginorsudo dnf install docker-model-plugin. See Get started with DMR.
Verify Docker Model Runner is available:
$ docker model version
If the command returns version information, you're ready to use local models.
Using models with DMR
Docker Model Runner can run any compatible model. Models can come from:
- Docker Hub repositories (
docker.io/namespace/model-name) - Your own OCI artifacts packaged and pushed to any registry
- HuggingFace models directly (
hf.co/org/model-name) - The Docker Model catalog in Docker Desktop
To see models available through the Docker catalog, run:
$ docker model list --available
To use a model, reference it in your configuration. DMR automatically pulls models on first use if they're not already local.
Configuration
Configure your agent to use Docker Model Runner with the dmr provider:
agents:
root:
model: dmr/ai/qwen3
instruction: You are a helpful assistant
toolsets:
- type: filesystemWhen you first run your agent, cagent prompts you to pull the model if it's not already available locally:
$ cagent run agent.yaml
Model not found locally. Do you want to pull it now? ([y]es/[n]o)
How it works
When you configure an agent to use DMR, cagent automatically connects to your local Docker Model Runner and routes inference requests to it. If a model isn't available locally, cagent prompts you to pull it on first use. No API keys or authentication are required.
Advanced configuration
For more control over model behavior, define a model configuration:
models:
local-qwen:
provider: dmr
model: ai/qwen3:14B
temperature: 0.7
max_tokens: 8192
agents:
root:
model: local-qwen
instruction: You are a helpful coding assistantFaster inference with speculative decoding
Speed up model responses using speculative decoding with a smaller draft model:
models:
fast-qwen:
provider: dmr
model: ai/qwen3:14B
provider_opts:
speculative_draft_model: ai/qwen3:0.6B-Q4_K_M
speculative_num_tokens: 16
speculative_acceptance_rate: 0.8The draft model generates token candidates, and the main model validates them. This can significantly improve throughput for longer responses.
Runtime flags
Pass engine-specific flags to optimize performance:
models:
optimized-qwen:
provider: dmr
model: ai/qwen3
provider_opts:
runtime_flags: ["--ngl=33", "--threads=8"]Common flags:
--ngl- Number of GPU layers--threads- CPU thread count--repeat-penalty- Repetition penalty
Using DMR for RAG
Docker Model Runner supports both embeddings and reranking for RAG workflows.
Embedding with DMR
Use local embeddings for indexing your knowledge base:
rag:
codebase:
docs: [./src]
strategies:
- type: chunked-embeddings
embedding_model: dmr/ai/embeddinggemma
database: ./code.dbReranking with DMR
DMR provides native reranking for improved RAG results:
models:
reranker:
provider: dmr
model: hf.co/ggml-org/qwen3-reranker-0.6b-q8_0-gguf
rag:
docs:
docs: [./documentation]
strategies:
- type: chunked-embeddings
embedding_model: dmr/ai/embeddinggemma
limit: 20
results:
reranking:
model: reranker
threshold: 0.5
limit: 5Native DMR reranking is the fastest option for reranking RAG results.
Troubleshooting
If cagent can't find Docker Model Runner:
Verify Docker Model Runner status:
$ docker model statusCheck available models:
$ docker model listCheck model logs for errors:
$ docker model logsEnsure Docker Desktop has Model Runner enabled in settings (macOS/Windows)
What's next
- Follow the tutorial to build your first agent with local models
- Learn about RAG to give your agents access to codebases and documentation
- See the configuration reference for all DMR options