DMR REST API
Once Model Runner is enabled, new API endpoints are available. You can use these endpoints to interact with a model programmatically.
Determine the base URL
The base URL to interact with the endpoints depends on how you run Docker:
- From containers:
http://model-runner.docker.internal/ - From host processes:
http://localhost:12434/, assuming TCP host access is enabled on the default port (12434).
- From containers:
http://172.17.0.1:12434/(with172.17.0.1representing the host gateway address) - From host processes:
http://localhost:12434/
NoteThe
172.17.0.1interface may not be available by default to containers within a Compose project. In this case, add anextra_hostsdirective to your Compose service YAML:extra_hosts: - "model-runner.docker.internal:host-gateway"Then you can access the Docker Model Runner APIs at http://model-runner.docker.internal:12434/
Available DMR endpoints
Create a model:
POST /models/createList models:
GET /modelsGet a model:
GET /models/{namespace}/{name}Delete a local model:
DELETE /models/{namespace}/{name}
Available OpenAI endpoints
DMR supports the following OpenAI endpoints:
GET /engines/llama.cpp/v1/modelsGET /engines/llama.cpp/v1/models/{namespace}/{name}POST /engines/llama.cpp/v1/chat/completionsPOST /engines/llama.cpp/v1/completionsPOST /engines/llama.cpp/v1/embeddings
To call these endpoints via a Unix socket (/var/run/docker.sock), prefix their path
with /exp/vDD4.40.
NoteYou can omit
llama.cppfrom the path. For example:POST /engines/v1/chat/completions.
REST API examples
Request from within a container
To call the chat/completions OpenAI endpoint from within another container using curl:
#!/bin/sh
curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Please write 500 words about the fall of Rome."
}
]
}'Request from the host using TCP
To call the chat/completions OpenAI endpoint from the host via TCP:
Enable the host-side TCP support from the Docker Desktop GUI, or via the Docker Desktop CLI. For example:
docker desktop enable model-runner --tcp <port>.If you are running on Windows, also enable GPU-backed inference. See Enable Docker Model Runner.
Interact with it as documented in the previous section using
localhostand the correct port.
#!/bin/sh
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Please write 500 words about the fall of Rome."
}
]
}'Request from the host using a Unix socket
To call the chat/completions OpenAI endpoint through the Docker socket from the host using curl:
#!/bin/sh
curl --unix-socket $HOME/.docker/run/docker.sock \
localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Please write 500 words about the fall of Rome."
}
]
}'