vLLM
Setting up vLLM
Section titled “Setting up vLLM”- Download + install vLLM
- Launch a vLLM OpenAI-compatible API server using the official vLLM documentation
For example, if we want to use the model dolphin-2.2.1-mistral-7b from HuggingFace, we would run:
python -m vllm.entrypoints.openai.api_server \--model ehartford/dolphin-2.2.1-mistral-7bvLLM will automatically download the model (if it’s not already downloaded) and store it in your HuggingFace cache directory.
Enabling vLLM with Docker
Section titled “Enabling vLLM with Docker”To enable vLLM models when running the Letta server with Docker, set the VLLM_API_BASE environment variable.
macOS/Windows:
Since vLLM is running on the host network, you will need to use host.docker.internal to connect to the vLLM server instead of localhost.
# replace `~/.letta/.persist/pgdata` with wherever you want to store your agent datadocker run \ -v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \ -p 8283:8283 \ -e VLLM_API_BASE="http://host.docker.internal:8000" \ letta/letta:latestLinux:
Use --network host and localhost:
docker run \ -v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \ --network host \ -e VLLM_API_BASE="http://localhost:8000" \ letta/letta:latestSee the self-hosting guide for more information on running Letta with Docker.