DMR REST API
Once Model Runner is enabled, new API endpoints are available. You can use these endpoints to interact with a model programmatically.
Determine the base URL
The base URL to interact with the endpoints depends on how you run Docker:
- From containers:
http://model-runner.docker.internal/
- From host processes:
http://localhost:12434/
, assuming TCP host access is enabled on the default port (12434).
- From containers:
http://172.17.0.1:12434/
(with172.17.0.1
representing the host gateway address) - From host processes:
http://localhost:12434/
NoteThe
172.17.0.1
interface may not be available by default to containers within a Compose project. In this case, add anextra_hosts
directive to your Compose service YAML:extra_hosts: - "model-runner.docker.internal:host-gateway"
Then you can access the Docker Model Runner APIs at http://model-runner.docker.internal:12434/
Available DMR endpoints
Create a model:
POST /models/create
List models:
GET /models
Get a model:
GET /models/{namespace}/{name}
Delete a local model:
DELETE /models/{namespace}/{name}
Available OpenAPI endpoints
DMR supports the following OpenAPI endpoints:
GET /engines/llama.cpp/v1/models
GET /engines/llama.cpp/v1/models/{namespace}/{name}
POST /engines/llama.cpp/v1/chat/completions
POST /engines/llama.cpp/v1/completions
POST /engines/llama.cpp/v1/embeddings
To call these endpoints via a Unix socket (/var/run/docker.sock
), prefix their path
with /exp/vDD4.40
.
NoteYou can omit
llama.cpp
from the path. For example:POST /engines/v1/chat/completions
.
REST API examples
Request from within a container
To call the chat/completions
OpenAI endpoint from within another container using curl
:
#!/bin/sh
curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Please write 500 words about the fall of Rome."
}
]
}'
Request from the host using TCP
To call the chat/completions
OpenAI endpoint from the host via TCP:
Enable the host-side TCP support from the Docker Desktop GUI, or via the Docker Desktop CLI. For example:
docker desktop enable model-runner --tcp <port>
.If you are running on Windows, also enable GPU-backed inference. See Enable Docker Model Runner.
Interact with it as documented in the previous section using
localhost
and the correct port.
#!/bin/sh
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Please write 500 words about the fall of Rome."
}
]
}'
Request from the host using a Unix socket
To call the chat/completions
OpenAI endpoint through the Docker socket from the host using curl
:
#!/bin/sh
curl --unix-socket $HOME/.docker/run/docker.sock \
localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Please write 500 words about the fall of Rome."
}
]
}'