Configuration file reference

This reference documents the YAML configuration file format for cagent agents. It covers file structure, agent parameters, model configuration, toolset setup, and RAG sources.

For detailed documentation of each toolset's capabilities and specific options, see the Toolsets reference.

File structure

A configuration file has four top-level sections:

agents: # Required - agent definitions
  root:
    model: anthropic/claude-sonnet-4-5
    description: What this agent does
    instruction: How it should behave

models: # Optional - model configurations
  custom_model:
    provider: openai
    model: gpt-5

rag: # Optional - RAG sources
  docs:
    docs: [./documents]
    strategies: [...]

metadata: # Optional - author, license, readme
  author: Your Name

Agents

PropertyTypeDescriptionRequired
modelstringModel reference or nameYes
descriptionstringBrief description of agent's purposeNo
instructionstringDetailed behavior instructionsYes
sub_agentsarrayAgent names for task delegationNo
handoffsarrayAgent names for conversation handoffNo
toolsetsarrayAvailable toolsNo
welcome_messagestringMessage displayed on startNo
add_datebooleanInclude current date in contextNo
add_environment_infobooleanInclude working directory, OS, Git infoNo
add_prompt_filesarrayPrompt file paths to includeNo
max_iterationsintegerMaximum tool call loops (unlimited if not set)No
num_history_itemsintegerConversation history limitNo
code_mode_toolsbooleanEnable Code Mode for toolsNo
commandsobjectNamed prompts accessible via /command_nameNo
structured_outputobjectJSON schema for structured responsesNo
ragarrayRAG source namesNo

Task delegation versus conversation handoff

Use sub_agents to break work into tasks. The root agent assigns work to a sub-agent and gets results back while staying in control.

Use handoffs to transfer the entire conversation to a different agent. The new agent takes over completely.

Commands

Named prompts users invoke with /command_name. Supports JavaScript template literals with ${env.VARIABLE} for environment variables:

commands:
  greet: "Say hello to ${env.USER}"
  analyze: "Analyze ${env.PROJECT_NAME || 'demo'}"

Run with: cagent run config.yaml /greet

Structured output

Constrain responses to a JSON schema (OpenAI and Gemini only):

structured_output:
  name: code_analysis
  strict: true
  schema:
    type: object
    properties:
      issues:
        type: array
        items: { ... }
    required: [issues]

Models

PropertyTypeDescriptionRequired
providerstringopenai, anthropic, google, dmrYes
modelstringModel nameYes
temperaturefloatRandomness (0.0-2.0)No
max_tokensintegerMaximum response lengthNo
top_pfloatNucleus sampling (0.0-1.0)No
frequency_penaltyfloatRepetition penalty (-2.0 to 2.0, OpenAI only)No
presence_penaltyfloatTopic penalty (-2.0 to 2.0, OpenAI only)No
base_urlstringCustom API endpointNo
parallel_tool_callsbooleanEnable parallel tool execution (default: true)No
token_keystringAuthentication token keyNo
track_usagebooleanTrack token usageNo
thinking_budgetmixedReasoning effort (provider-specific)No
provider_optsobjectProvider-specific optionsNo

Alloy models

Use multiple models in rotation by separating names with commas:

model: anthropic/claude-sonnet-4-5,openai/gpt-5

Thinking budget

Controls reasoning depth. Configuration varies by provider:

  • OpenAI: String values - minimal, low, medium, high
  • Anthropic: Integer token budget (1024-32768, must be less than max_tokens)
    • Set provider_opts.interleaved_thinking: true for tool use during reasoning
  • Gemini: Integer token budget (0 to disable, -1 for dynamic, max 24576)
    • Gemini 2.5 Pro: 128-32768, cannot disable (minimum 128)
# OpenAI
thinking_budget: low

# Anthropic
thinking_budget: 8192
provider_opts:
  interleaved_thinking: true

# Gemini
thinking_budget: 8192    # Fixed
thinking_budget: -1      # Dynamic
thinking_budget: 0       # Disabled

Docker Model Runner (DMR)

Run local models. If base_url is omitted, cagent auto-discovers via Docker Model plugin.

provider: dmr
model: ai/qwen3
max_tokens: 8192
base_url: http://localhost:12434/engines/llama.cpp/v1 # Optional

Pass llama.cpp options via provider_opts.runtime_flags (array, string, or multiline):

provider_opts:
  runtime_flags: ["--ngl=33", "--threads=8"]
  # or: runtime_flags: "--ngl=33 --threads=8"

Model config fields auto-map to runtime flags:

  • temperature--temp
  • top_p--top-p
  • max_tokens--context-size

Explicit runtime_flags override auto-mapped flags.

Speculative decoding for faster inference:

provider_opts:
  speculative_draft_model: ai/qwen3:0.6B-F16
  speculative_num_tokens: 16
  speculative_acceptance_rate: 0.8

Tools

Configure tools in the toolsets array. Three types: built-in, MCP (local/remote), and Docker Gateway.

Note

documentation of each toolset's capabilities, available tools, and specific configuration options, see the Toolsets reference.

All toolsets support common properties like tools (whitelist), defer (deferred loading), toon (output compression), env (environment variables), and instruction (usage guidance). See the Toolsets reference for details on these properties and what each toolset does.

Built-in tools

toolsets:
  - type: filesystem
  - type: shell
  - type: think
  - type: todo
    shared: true
  - type: memory
    path: ./memory.db

MCP tools

Local process:

- type: mcp
  command: npx
  args:
    ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/files"]
  tools: ["read_file", "write_file"] # Optional: limit to specific tools
  env:
    NODE_OPTIONS: "--max-old-space-size=8192"

Remote server:

- type: mcp
  remote:
    url: https://mcp-server.example.com
    transport_type: sse
    headers:
      Authorization: Bearer token

Docker MCP Gateway

Containerized tools from Docker MCP Catalog:

- type: mcp
  ref: docker:duckduckgo

RAG

Retrieval-augmented generation for document knowledge bases. Define sources at the top level, reference in agents.

rag:
  docs:
    docs: [./documents, ./README.md]
    strategies:
      - type: chunked-embeddings
        embedding_model: openai/text-embedding-3-small
        vector_dimensions: 1536
        database: ./embeddings.db

agents:
  root:
    rag: [docs]

Retrieval strategies

All strategies support chunking configuration. Chunk size and overlap are measured in characters (Unicode code points), not tokens.

Chunked-embeddings

Direct semantic search using vector embeddings. Best for understanding intent, synonyms, and paraphrasing.

FieldTypeDefault
embedding_modelstring-
databasestring-
vector_dimensionsinteger-
similarity_metricstringcosine
thresholdfloat0.5
limitinteger5
chunking.sizeinteger1000
chunking.overlapinteger75
chunking.respect_word_boundariesbooleantrue
chunking.code_awarebooleanfalse
- type: chunked-embeddings
  embedding_model: openai/text-embedding-3-small
  vector_dimensions: 1536
  database: ./vector.db
  similarity_metric: cosine_similarity
  threshold: 0.5
  limit: 10
  chunking:
    size: 1000
    overlap: 100

Semantic-embeddings

LLM-enhanced semantic search. Uses a language model to generate rich semantic summaries of each chunk before embedding, capturing deeper meaning.

FieldTypeDefault
embedding_modelstring-
chat_modelstring-
databasestring-
vector_dimensionsinteger-
similarity_metricstringcosine
thresholdfloat0.5
limitinteger5
ast_contextbooleanfalse
semantic_promptstring-
chunking.sizeinteger1000
chunking.overlapinteger75
chunking.respect_word_boundariesbooleantrue
chunking.code_awarebooleanfalse
- type: semantic-embeddings
  embedding_model: openai/text-embedding-3-small
  vector_dimensions: 1536
  chat_model: openai/gpt-5-mini
  database: ./semantic.db
  threshold: 0.3
  limit: 10
  chunking:
    size: 1000
    overlap: 100

BM25

Keyword-based search using BM25 algorithm. Best for exact terms, technical jargon, and code identifiers.

FieldTypeDefault
databasestring-
k1float1.5
bfloat0.75
thresholdfloat0.0
limitinteger5
chunking.sizeinteger1000
chunking.overlapinteger75
chunking.respect_word_boundariesbooleantrue
chunking.code_awarebooleanfalse
- type: bm25
  database: ./bm25.db
  k1: 1.5
  b: 0.75
  threshold: 0.3
  limit: 10
  chunking:
    size: 1000
    overlap: 100

Hybrid retrieval

Combine multiple strategies with fusion:

strategies:
  - type: chunked-embeddings
    embedding_model: openai/text-embedding-3-small
    vector_dimensions: 1536
    database: ./vector.db
    limit: 20
  - type: bm25
    database: ./bm25.db
    limit: 15

results:
  fusion:
    strategy: rrf # Options: rrf, weighted, max
    k: 60 # RRF smoothing parameter
  deduplicate: true
  limit: 5

Fusion strategies:

  • rrf: Reciprocal Rank Fusion (recommended, rank-based, no normalization needed)
  • weighted: Weighted combination (fusion.weights: {chunked-embeddings: 0.7, bm25: 0.3})
  • max: Maximum score across strategies

Reranking

Re-score results with a specialized model for improved relevance:

results:
  reranking:
    model: openai/gpt-5-mini
    top_k: 10 # Only rerank top K (0 = all)
    threshold: 0.3 # Minimum score after reranking
    criteria: | # Optional domain-specific guidance
      Prioritize official docs over blog posts
  limit: 5

DMR native reranking:

models:
  reranker:
    provider: dmr
    model: hf.co/ggml-org/qwen3-reranker-0.6b-q8_0-gguf

results:
  reranking:
    model: reranker

Code-aware chunking

For source code, use AST-based chunking. With semantic-embeddings, you can include AST metadata in the LLM prompts:

- type: semantic-embeddings
  embedding_model: openai/text-embedding-3-small
  vector_dimensions: 1536
  chat_model: openai/gpt-5-mini
  database: ./code.db
  ast_context: true # Include AST metadata in semantic prompts
  chunking:
    size: 2000
    code_aware: true # Enable AST-based chunking

RAG properties

Top-level RAG source:

FieldTypeDescription
docs[]stringDocument paths (suppports glob patterns, respects .gitignore)
toolobjectCustomize RAG tool name/description/instruction
strategies[]objectRetrieval strategies (see above for strategy-specific fields)
resultsobjectPost-processing (fusion, reranking, limits)

Results:

FieldTypeDefault
limitinteger15
deduplicatebooleantrue
include_scorebooleanfalse
fusion.strategystring-
fusion.kinteger60
fusion.weightsobject-
reranking.modelstring-
reranking.top_kinteger0
reranking.thresholdfloat0.5
reranking.criteriastring""
return_full_contentbooleanfalse

Metadata

Documentation and sharing information:

PropertyTypeDescription
authorstringAuthor name
licensestringLicense (e.g., MIT, Apache-2.0)
readmestringUsage documentation
metadata:
  author: Your Name
  license: MIT
  readme: |
    Description and usage instructions

Example configuration

Complete configuration demonstrating key features:

agents:
  root:
    model: claude
    description: Technical lead
    instruction: Coordinate development tasks and delegate to specialists
    sub_agents: [developer, reviewer]
    toolsets:
      - type: filesystem
      - type: mcp
        ref: docker:duckduckgo
    rag: [readmes]
    commands:
      status: "Check project status"

  developer:
    model: gpt
    description: Software developer
    instruction: Write clean, maintainable code
    toolsets:
      - type: filesystem
      - type: shell

  reviewer:
    model: claude
    description: Code reviewer
    instruction: Review for quality and security
    toolsets:
      - type: filesystem

models:
  gpt:
    provider: openai
    model: gpt-5

  claude:
    provider: anthropic
    model: claude-sonnet-4-5
    max_tokens: 64000

rag:
  readmes:
    docs: ["**/README.md"]
    strategies:
      - type: chunked-embeddings
        embedding_model: openai/text-embedding-3-small
        vector_dimensions: 1536
        database: ./embeddings.db
        limit: 10
      - type: bm25
        database: ./bm25.db
        limit: 10
    results:
      fusion:
        strategy: rrf
        k: 60
      limit: 5

What's next