Use containers for generative AI development
Prerequisites
Complete Containerize a generative AI application.
Overview
In this section, you'll learn how to set up a development environment to access all the services that your generative AI (GenAI) application needs. This includes:
- Adding a local database
- Adding a local or remote LLM service
Note
You can see more samples of containerized GenAI applications in the GenAI Stack demo applications.
Add a local database
You can use containers to set up local services, like a database. In this section, you'll update the compose.yaml
file to define a database service. In addition, you'll specify an environment variables file to load the database connection information rather than manually entering the information every time.
To run the database service:
In the cloned repository's directory, rename
env.example
file to.env
. This file contains the environment variables that the containers will use.In the cloned repository's directory, open the
compose.yaml
file in an IDE or text editor.In the
compose.yaml
file, add the following:- Add instructions to run a Neo4j database
- Specify the environment file under the server service in order to pass in the environment variables for the connection
The following is the updated
compose.yaml
file. All comments have been removed.services: server: build: context: . ports: - 8000:8000 env_file: - .env depends_on: database: condition: service_healthy database: image: neo4j:5.11 ports: - "7474:7474" - "7687:7687" environment: - NEO4J_AUTH=${NEO4J_USERNAME}/${NEO4J_PASSWORD} healthcheck: test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider localhost:7474 || exit 1"] interval: 5s timeout: 3s retries: 5
Note
To learn more about Neo4j, see the Neo4j Official Docker Image.
Run the application. Inside the
docker-genai-sample
directory, run the following command in a terminal.$ docker compose up --build
Access the application. Open a browser and view the application at http://localhost:8000. You should see a simple Streamlit application. Note that asking questions to a PDF will cause the application to fail because the LLM service specified in the
.env
file isn't running yet.Stop the application. In the terminal, press
ctrl
+c
to stop the application.
Add a local or remote LLM service
The sample application supports both Ollama and OpenAI. This guide provides instructions for the following scenarios:
- Run Ollama in a container
- Run Ollama outside of a container
- Use OpenAI
While all platforms can use any of the previous scenarios, the performance and GPU support may vary. You can use the following guidelines to help you choose the appropriate option:
- Run Ollama in a container if you're on Linux, and using a native installation of the Docker Engine, or Windows 10/11, and using Docker Desktop, you have a CUDA-supported GPU, and your system has at least 8 GB of RAM.
- Run Ollama outside of a container if you're on an Apple silicon Mac.
- Use OpenAI if the previous two scenarios don't apply to you.
Choose one of the following options for your LLM service.
When running Ollama in a container, you should have a CUDA-supported GPU. While you can run Ollama in a container without a supported GPU, the performance may not be acceptable. Only Linux and Windows 11 support GPU access to containers.
To run Ollama in a container and provide GPU access:
Install the prerequisites.
- For Docker Engine on Linux, install the NVIDIA Container Toolkilt.
- For Docker Desktop on Windows 10/11, install the latest NVIDIA driver and make sure you are using the WSL2 backend
Add the Ollama service and a volume in your
compose.yaml
. The following is the updatedcompose.yaml
:services: server: build: context: . ports: - 8000:8000 env_file: - .env depends_on: database: condition: service_healthy database: image: neo4j:5.11 ports: - "7474:7474" - "7687:7687" environment: - NEO4J_AUTH=${NEO4J_USERNAME}/${NEO4J_PASSWORD} healthcheck: test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider localhost:7474 || exit 1"] interval: 5s timeout: 3s retries: 5 ollama: image: ollama/ollama:latest ports: - "11434:11434" volumes: - ollama_volume:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] volumes: ollama_volume:
Note
For more details about the Compose instructions, see Turn on GPU access with Docker Compose.
Add the ollama-pull service to your
compose.yaml
file. This service uses thedocker/genai:ollama-pull
image, based on the GenAI Stack's pull_model.Dockerfile. The service will automatically pull the model for your Ollama container. The following is the updated section of thecompose.yaml
file:services: server: build: context: . ports: - 8000:8000 env_file: - .env depends_on: database: condition: service_healthy ollama-pull: condition: service_completed_successfully ollama-pull: image: docker/genai:ollama-pull env_file: - .env # ...
To run Ollama outside of a container:
- Install and run Ollama on your host machine.
- Update the
OLLAMA_BASE_URL
value in your.env
file tohttp://host.docker.internal:11434
. - Pull the model to Ollama using the following command.
$ ollama pull llama2
Important
Using OpenAI requires an OpenAI account. OpenAI is a third-party hosted service and charges may apply.
- Update the
LLM
value in your.env
file togpt-3.5
. - Uncomment and update the
OPENAI_API_KEY
value in your.env
file to your OpenAI API key.
Run your GenAI application
At this point, you have the following services in your Compose file:
- Server service for your main GenAI application
- Database service to store vectors in a Neo4j database
- (optional) Ollama service to run the LLM
- (optional) Ollama-pull service to automatically pull the model for the Ollama service
To run all the services, run the following command in your docker-genai-sample
directory:
$ docker compose up --build
If your Compose file has the ollama-pull service, it may take several minutes for the ollama-pull service to pull the model. The ollama-pull service will continuously update the console with its status. After pulling the model, the ollama-pull service container will stop and you can access the application.
Once the application is running, open a browser and access the application at http://localhost:8000.
Upload a PDF file, for example the Docker CLI Cheat Sheet, and ask a question about the PDF.
Depending on your system and the LLM service that you chose, it may take several minutes to answer. If you are using Ollama and the performance isn't acceptable, try using OpenAI.
Summary
In this section, you learned how to set up a development environment to provide access all the services that your GenAI application needs.
Related information:
- Dockerfile reference
- Compose file reference
- Ollama Docker image
- Neo4j Official Docker Image
- GenAI Stack demo applications
Next steps
See samples of more GenAI applications in the GenAI Stack demo applications.