Configuration file reference

This reference documents the YAML configuration file format for cagent agents. It covers file structure, agent parameters, model configuration, toolset setup, and RAG sources.

For detailed documentation of each toolset's capabilities and specific options, see the Toolsets reference.

File structure

A configuration file has four top-level sections:

agents: # Required - agent definitions
  root:
    model: anthropic/claude-sonnet-4-5
    description: What this agent does
    instruction: How it should behave

models: # Optional - model configurations
  custom_model:
    provider: openai
    model: gpt-5

rag: # Optional - RAG sources
  docs:
    docs: [./documents]
    strategies: [...]

metadata: # Optional - author, license, readme
  author: Your Name

Agents

Property	Type	Description	Required
`model`	string	Model reference or name	Yes
`description`	string	Brief description of agent's purpose	No
`instruction`	string	Detailed behavior instructions	Yes
`sub_agents`	array	Agent names for task delegation	No
`handoffs`	array	Agent names for conversation handoff	No
`toolsets`	array	Available tools	No
`welcome_message`	string	Message displayed on start	No
`add_date`	boolean	Include current date in context	No
`add_environment_info`	boolean	Include working directory, OS, Git info	No
`add_prompt_files`	array	Prompt file paths to include	No
`max_iterations`	integer	Maximum tool call loops (unlimited if not set)	No
`num_history_items`	integer	Conversation history limit	No
`code_mode_tools`	boolean	Enable Code Mode for tools	No
`commands`	object	Named prompts accessible via `/command_name`	No
`structured_output`	object	JSON schema for structured responses	No
`rag`	array	RAG source names	No

Task delegation versus conversation handoff

Agents support two different delegation mechanisms. Choose based on whether you need task results or conversation control.

Sub_agents: Hierarchical task delegation

Use sub_agents for hierarchical task delegation. The parent agent assigns a specific task to a child agent using the transfer_task tool. The child executes in its own context and returns results. The parent maintains control and can delegate to multiple agents in sequence.

This works well for structured workflows where you need to combine results from specialists, or when tasks have clear boundaries. Each delegated task runs independently and reports back to the parent.

Example:

agents:
  root:
    sub_agents: [researcher, analyst]
    instruction: |
      Delegate research to researcher.
      Delegate analysis to analyst.
      Combine results and present findings.

Root calls: transfer_task(agent="researcher", task="Find pricing data"). The researcher completes the task and returns results to root.

Handoffs: Conversation transfer

Use handoffs to transfer conversation control to a different agent. When an agent uses the handoff tool, the new agent takes over completely. The original agent steps back until someone hands back to it.

This works well when different agents should own different parts of an ongoing conversation, or when specialists need to collaborate as peers without a coordinator managing every step.

Example:

agents:
  generalist:
    handoffs: [database_expert, security_expert]
    instruction: |
      Help with general development questions.
      If the conversation moves to database optimization,
      hand off to database_expert.
      If security concerns arise, hand off to security_expert.

  database_expert:
    handoffs: [generalist, security_expert]
    instruction: Handle database design and optimization.

  security_expert:
    handoffs: [generalist, database_expert]
    instruction: Review code for security vulnerabilities.

When the user asks about query performance, generalist executes: handoff(agent="database_expert"). The database expert now owns the conversation and can continue working with the user directly, or hand off to security_expert if the discussion shifts to SQL injection concerns.

Commands

Named prompts users invoke with /command_name. Supports JavaScript template literals with ${env.VARIABLE} for environment variables:

commands:
  greet: "Say hello to ${env.USER}"
  analyze: "Analyze ${env.PROJECT_NAME || 'demo'}"

Run with: cagent run config.yaml /greet

Structured output

Constrain responses to a JSON schema (OpenAI and Gemini only):

structured_output:
  name: code_analysis
  strict: true
  schema:
    type: object
    properties:
      issues:
        type: array
        items: { ... }
    required: [issues]

Models

Property	Type	Description	Required
`provider`	string	`openai`, `anthropic`, `google`, `dmr`	Yes
`model`	string	Model name	Yes
`temperature`	float	Randomness (0.0-2.0)	No
`max_tokens`	integer	Maximum response length	No
`top_p`	float	Nucleus sampling (0.0-1.0)	No
`frequency_penalty`	float	Repetition penalty (-2.0 to 2.0, OpenAI only)	No
`presence_penalty`	float	Topic penalty (-2.0 to 2.0, OpenAI only)	No
`base_url`	string	Custom API endpoint	No
`parallel_tool_calls`	boolean	Enable parallel tool execution (default: true)	No
`token_key`	string	Authentication token key	No
`track_usage`	boolean	Track token usage	No
`thinking_budget`	mixed	Reasoning effort (provider-specific)	No
`provider_opts`	object	Provider-specific options	No

Alloy models

Use multiple models in rotation by separating names with commas:

model: anthropic/claude-sonnet-4-5,openai/gpt-5

Thinking budget

Controls reasoning depth. Configuration varies by provider:

OpenAI: String values - minimal, low, medium, high
Anthropic: Integer token budget (1024-32768, must be less than max_tokens)
- Set provider_opts.interleaved_thinking: true for tool use during reasoning
Gemini: Integer token budget (0 to disable, -1 for dynamic, max 24576)
- Gemini 2.5 Pro: 128-32768, cannot disable (minimum 128)

# OpenAI
thinking_budget: low

# Anthropic
thinking_budget: 8192
provider_opts:
  interleaved_thinking: true

# Gemini
thinking_budget: 8192    # Fixed
thinking_budget: -1      # Dynamic
thinking_budget: 0       # Disabled

Docker Model Runner (DMR)

Run local models. If base_url is omitted, cagent auto-discovers via Docker Model plugin.

provider: dmr
model: ai/qwen3
max_tokens: 8192
base_url: http://localhost:12434/engines/llama.cpp/v1 # Optional

Pass llama.cpp options via provider_opts.runtime_flags (array, string, or multiline):

provider_opts:
  runtime_flags: ["--ngl=33", "--threads=8"]
  # or: runtime_flags: "--ngl=33 --threads=8"

Model config fields auto-map to runtime flags:

temperature → --temp
top_p → --top-p
max_tokens → --context-size

Explicit runtime_flags override auto-mapped flags.

Speculative decoding for faster inference:

provider_opts:
  speculative_draft_model: ai/qwen3:0.6B-F16
  speculative_num_tokens: 16
  speculative_acceptance_rate: 0.8

Tools

Configure tools in the toolsets array. Three types: built-in, MCP (local/remote), and Docker Gateway.

Note
documentation of each toolset's capabilities, available tools, and specific configuration options, see the Toolsets reference.

All toolsets support common properties like tools (whitelist), defer (deferred loading), toon (output compression), env (environment variables), and instruction (usage guidance). See the Toolsets reference for details on these properties and what each toolset does.

Built-in tools

toolsets:
  - type: filesystem
  - type: shell
  - type: think
  - type: todo
    shared: true
  - type: memory
    path: ./memory.db

MCP tools

Local process:

- type: mcp
  command: npx
  args:
    ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/files"]
  tools: ["read_file", "write_file"] # Optional: limit to specific tools
  env:
    NODE_OPTIONS: "--max-old-space-size=8192"

Remote server:

- type: mcp
  remote:
    url: https://mcp-server.example.com
    transport_type: sse
    headers:
      Authorization: Bearer token

Docker MCP Gateway

Containerized tools from Docker MCP Catalog:

- type: mcp
  ref: docker:duckduckgo

RAG

Retrieval-augmented generation for document knowledge bases. Define sources at the top level, reference in agents.

rag:
  docs:
    docs: [./documents, ./README.md]
    strategies:
      - type: chunked-embeddings
        embedding_model: openai/text-embedding-3-small
        vector_dimensions: 1536
        database: ./embeddings.db

agents:
  root:
    rag: [docs]

Retrieval strategies

All strategies support chunking configuration. Chunk size and overlap are measured in characters (Unicode code points), not tokens.

Chunked-embeddings

Direct semantic search using vector embeddings. Best for understanding intent, synonyms, and paraphrasing.

Field	Type	Default
`embedding_model`	string	-
`database`	string	-
`vector_dimensions`	integer	-
`similarity_metric`	string	cosine
`threshold`	float	0.5
`limit`	integer	5
`chunking.size`	integer	1000
`chunking.overlap`	integer	75
`chunking.respect_word_boundaries`	boolean	true
`chunking.code_aware`	boolean	false

- type: chunked-embeddings
  embedding_model: openai/text-embedding-3-small
  vector_dimensions: 1536
  database: ./vector.db
  similarity_metric: cosine_similarity
  threshold: 0.5
  limit: 10
  chunking:
    size: 1000
    overlap: 100

Semantic-embeddings

LLM-enhanced semantic search. Uses a language model to generate rich semantic summaries of each chunk before embedding, capturing deeper meaning.

Field	Type	Default
`embedding_model`	string	-
`chat_model`	string	-
`database`	string	-
`vector_dimensions`	integer	-
`similarity_metric`	string	cosine
`threshold`	float	0.5
`limit`	integer	5
`ast_context`	boolean	false
`semantic_prompt`	string	-
`chunking.size`	integer	1000
`chunking.overlap`	integer	75
`chunking.respect_word_boundaries`	boolean	true
`chunking.code_aware`	boolean	false

- type: semantic-embeddings
  embedding_model: openai/text-embedding-3-small
  vector_dimensions: 1536
  chat_model: openai/gpt-5-mini
  database: ./semantic.db
  threshold: 0.3
  limit: 10
  chunking:
    size: 1000
    overlap: 100

BM25

Keyword-based search using BM25 algorithm. Best for exact terms, technical jargon, and code identifiers.

Field	Type	Default
`database`	string	-
`k1`	float	1.5
`b`	float	0.75
`threshold`	float	0.0
`limit`	integer	5
`chunking.size`	integer	1000
`chunking.overlap`	integer	75
`chunking.respect_word_boundaries`	boolean	true
`chunking.code_aware`	boolean	false

- type: bm25
  database: ./bm25.db
  k1: 1.5
  b: 0.75
  threshold: 0.3
  limit: 10
  chunking:
    size: 1000
    overlap: 100

Hybrid retrieval

Combine multiple strategies with fusion:

strategies:
  - type: chunked-embeddings
    embedding_model: openai/text-embedding-3-small
    vector_dimensions: 1536
    database: ./vector.db
    limit: 20
  - type: bm25
    database: ./bm25.db
    limit: 15

results:
  fusion:
    strategy: rrf # Options: rrf, weighted, max
    k: 60 # RRF smoothing parameter
  deduplicate: true
  limit: 5

Fusion strategies:

rrf: Reciprocal Rank Fusion (recommended, rank-based, no normalization needed)
weighted: Weighted combination (fusion.weights: {chunked-embeddings: 0.7, bm25: 0.3})
max: Maximum score across strategies

Reranking

Re-score results with a specialized model for improved relevance:

results:
  reranking:
    model: openai/gpt-5-mini
    top_k: 10 # Only rerank top K (0 = all)
    threshold: 0.3 # Minimum score after reranking
    criteria: | # Optional domain-specific guidance
      Prioritize official docs over blog posts
  limit: 5

DMR native reranking:

models:
  reranker:
    provider: dmr
    model: hf.co/ggml-org/qwen3-reranker-0.6b-q8_0-gguf

results:
  reranking:
    model: reranker

Code-aware chunking

For source code, use AST-based chunking. With semantic-embeddings, you can include AST metadata in the LLM prompts:

- type: semantic-embeddings
  embedding_model: openai/text-embedding-3-small
  vector_dimensions: 1536
  chat_model: openai/gpt-5-mini
  database: ./code.db
  ast_context: true # Include AST metadata in semantic prompts
  chunking:
    size: 2000
    code_aware: true # Enable AST-based chunking

RAG properties

Top-level RAG source:

Field	Type	Description
`docs`	[]string	Document paths (supports glob patterns, respects `.gitignore`)
`tool`	object	Customize RAG tool name/description/instruction
`strategies`	[]object	Retrieval strategies (see above for strategy-specific fields)
`results`	object	Post-processing (fusion, reranking, limits)

Results:

Field	Type	Default
`limit`	integer	15
`deduplicate`	boolean	true
`include_score`	boolean	false
`fusion.strategy`	string	-
`fusion.k`	integer	60
`fusion.weights`	object	-
`reranking.model`	string	-
`reranking.top_k`	integer	0
`reranking.threshold`	float	0.5
`reranking.criteria`	string	""
`return_full_content`	boolean	false

Metadata

Documentation and sharing information:

Property	Type	Description
`author`	string	Author name
`license`	string	License (e.g., MIT, Apache-2.0)
`readme`	string	Usage documentation

metadata:
  author: Your Name
  license: MIT
  readme: |
    Description and usage instructions

Example configuration

Complete configuration demonstrating key features:

agents:
  root:
    model: claude
    description: Technical lead
    instruction: Coordinate development tasks and delegate to specialists
    sub_agents: [developer, reviewer]
    toolsets:
      - type: filesystem
      - type: mcp
        ref: docker:duckduckgo
    rag: [readmes]
    commands:
      status: "Check project status"

  developer:
    model: gpt
    description: Software developer
    instruction: Write clean, maintainable code
    toolsets:
      - type: filesystem
      - type: shell

  reviewer:
    model: claude
    description: Code reviewer
    instruction: Review for quality and security
    toolsets:
      - type: filesystem

models:
  gpt:
    provider: openai
    model: gpt-5

  claude:
    provider: anthropic
    model: claude-sonnet-4-5
    max_tokens: 64000

rag:
  readmes:
    docs: ["**/README.md"]
    strategies:
      - type: chunked-embeddings
        embedding_model: openai/text-embedding-3-small
        vector_dimensions: 1536
        database: ./embeddings.db
        limit: 10
      - type: bm25
        database: ./bm25.db
        limit: 10
    results:
      fusion:
        strategy: rrf
        k: 60
      limit: 5

What's next

Read the Toolsets reference for detailed toolset documentation
Review the CLI reference for command-line options
Browse example configurations
Learn about sharing agents

Ask me about Docker

Configuration file reference