DMR REST API

Once Model Runner is enabled, new API endpoints are available. You can use these endpoints to interact with a model programmatically.

Determine the base URL

The base URL to interact with the endpoints depends on how you run Docker:

  • From containers: http://model-runner.docker.internal/
  • From host processes: http://localhost:12434/, assuming TCP host access is enabled on the default port (12434).
  • From containers: http://172.17.0.1:12434/ (with 172.17.0.1 representing the host gateway address)
  • From host processes: http://localhost:12434/
Note

The 172.17.0.1 interface may not be available by default to containers within a Compose project. In this case, add an extra_hosts directive to your Compose service YAML:

extra_hosts:
  - "model-runner.docker.internal:host-gateway"

Then you can access the Docker Model Runner APIs at http://model-runner.docker.internal:12434/

Available DMR endpoints

  • Create a model:

    POST /models/create
  • List models:

    GET /models
  • Get a model:

    GET /models/{namespace}/{name}
  • Delete a local model:

    DELETE /models/{namespace}/{name}

Available OpenAPI endpoints

DMR supports the following OpenAPI endpoints:

To call these endpoints via a Unix socket (/var/run/docker.sock), prefix their path with /exp/vDD4.40.

Note

You can omit llama.cpp from the path. For example: POST /engines/v1/chat/completions.

REST API examples

Request from within a container

To call the chat/completions OpenAI endpoint from within another container using curl:

#!/bin/sh

curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/smollm2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 500 words about the fall of Rome."
            }
        ]
    }'

Request from the host using TCP

To call the chat/completions OpenAI endpoint from the host via TCP:

  1. Enable the host-side TCP support from the Docker Desktop GUI, or via the Docker Desktop CLI. For example: docker desktop enable model-runner --tcp <port>.

    If you are running on Windows, also enable GPU-backed inference. See Enable Docker Model Runner.

  2. Interact with it as documented in the previous section using localhost and the correct port.

#!/bin/sh

  curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/smollm2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 500 words about the fall of Rome."
            }
        ]
    }'

Request from the host using a Unix socket

To call the chat/completions OpenAI endpoint through the Docker socket from the host using curl:

#!/bin/sh

curl --unix-socket $HOME/.docker/run/docker.sock \
    localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/smollm2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 500 words about the fall of Rome."
            }
        ]
    }'