Low-latency Agents (Legacy)
Low-latency agents optimize for minimal response time by using a constrained context window and aggressive memory management. They’re ideal for real-time applications like voice interfaces where latency matters more than context retention.
Architecture
Section titled “Architecture”Low-latency agents use a much smaller context window than standard MemGPT agents, reducing the time-to-first-token at the cost of much more limited conversation history and memory block size. A sleep-time agent aggressively manages memory to keep only the most relevant information in context.
Key differences from MemGPT v2:
- Artificially constrained context window for faster response times
- More aggressive memory management with smaller memory blocks
- Optimized sleep-time agent tuned for minimal context size
- Prioritizes speed over comprehensive context retention
To learn more about how to use low-latency agents for voice applications, see our Voice Agents guide.
Creating Low-latency Agents
Section titled “Creating Low-latency Agents”Use the voice_convo_agent agent type to create a low-latency agent.
Set enable_sleeptime to true to enable the sleep-time agent which will manage the memory state of the low-latency agent in the background.
Additionally, set initial_message_sequence to an empty array to start the conversation with no initial messages for a completely empty initial message buffer.
import { LettaClient } from "@letta-ai/letta-client";
const client = new LettaClient({ token: "LETTA_API_KEY" });
// create the Letta agentconst agent = await client.agents.create({ agentType: "voice_convo_agent", memoryBlocks: [ { value: "Name: ?", label: "human" }, { value: "You are a helpful assistant.", label: "persona" }, ], model: "openai/gpt-4o-mini", // Use 4o-mini for speed embedding: "openai/text-embedding-3-small", enableSleeptime: true, initialMessageSequence: [],});from letta_client import Letta
client = Letta(token="LETTA_API_KEY")
# create the Letta agentagent = client.agents.create( agent_type="voice_convo_agent", memory_blocks=[ {"value": "Name: ?", "label": "human"}, {"value": "You are a helpful assistant.", "label": "persona"}, ], model="openai/gpt-4o-mini", # Use 4o-mini for speed embedding="openai/text-embedding-3-small", enable_sleeptime=True, initial_message_sequence = [],)curl -X POST https://api.letta.com/v1/agents \ -H "Authorization: Bearer $LETTA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "agent_type": "voice_convo_agent", "memory_blocks": [ { "value": "Name: ?", "label": "human" }, { "value": "You are a helpful assistant.", "label": "persona" } ], "model": "openai/gpt-4o-mini", "embedding": "openai/text-embedding-3-small", "enable_sleeptime": true, "initial_message_sequence": []}'