Skip to content
  • Auto
  • Light
  • Dark
DiscordForumGitHubSign up
Self-hosting
Model providers
More providers
View as Markdown
Copy Markdown

Open in Claude
Open in ChatGPT

vLLM

  1. Download + install vLLM
  2. Launch a vLLM OpenAI-compatible API server using the official vLLM documentation

For example, if we want to use the model dolphin-2.2.1-mistral-7b from HuggingFace, we would run:

Terminal window
python -m vllm.entrypoints.openai.api_server \
--model ehartford/dolphin-2.2.1-mistral-7b

vLLM will automatically download the model (if it’s not already downloaded) and store it in your HuggingFace cache directory.

To enable vLLM models when running the Letta server with Docker, set the VLLM_API_BASE environment variable.

macOS/Windows: Since vLLM is running on the host network, you will need to use host.docker.internal to connect to the vLLM server instead of localhost.

Terminal window
# replace `~/.letta/.persist/pgdata` with wherever you want to store your agent data
docker run \
-v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \
-p 8283:8283 \
-e VLLM_API_BASE="http://host.docker.internal:8000" \
letta/letta:latest

Linux: Use --network host and localhost:

Terminal window
docker run \
-v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \
--network host \
-e VLLM_API_BASE="http://localhost:8000" \
letta/letta:latest

See the self-hosting guide for more information on running Letta with Docker.