Skip to content
  • Auto
  • Light
  • Dark
DiscordForumGitHubSign up
Building on the Letta API
View as Markdown
Copy Markdown

Open in Claude
Open in ChatGPT

Streaming agent responses

Messages from the Letta server can be streamed to the client. If you’re building a UI on the Letta API, enabling streaming allows your UI to update in real-time as the agent generates a response to an input message.

Letta supports two streaming modes: step streaming (default) and token streaming.

To enable streaming, use the /v1/agents/{agent_id}/messages/stream endpoint instead of /messages:

typescript
import { LettaClient } from "@letta-ai/letta-client";
const client = new LettaClient({ token: "YOUR_API_KEY" });
// Step streaming (default) - returns complete messages
const stream = await client.agents.messages.createStream(agent.id, {
messages: [{ role: "user", content: "Hello!" }],
});
for await (const chunk of stream) {
console.log(chunk); // Complete message objects
}
// Token streaming - returns partial chunks for real-time UX
const tokenStream = await client.agents.messages.createStream(agent.id, {
messages: [{ role: "user", content: "Hello!" }],
streamTokens: true, // Enable token streaming
});
for await (const chunk of tokenStream) {
console.log(chunk); // Partial content chunks
}
AspectStep Streaming (default)Token Streaming
What you getComplete messages after each stepPartial chunks as tokens generate
When to useSimple implementationChatGPT-like real-time UX
Reassembly neededNoYes (by message ID)
Message IDsUnique per messageSame ID across chunks
Content formatFull text in each messageIncremental text pieces
Enable withDefault behaviorstream_tokens: true

The messages you receive depend on your agent’s configuration:

With reasoning enabled (default):

  • Simple response: reasoning_messageassistant_message
  • With tool use: reasoning_messagetool_call_messagetool_return_messagereasoning_messageassistant_message

With reasoning disabled (reasoning=false):

  • Simple response: assistant_message
  • With tool use: tool_call_messagetool_return_messageassistant_message
  • reasoning_message: Agent’s internal thinking process (only when reasoning=true)
  • assistant_message: The actual response shown to the user
  • tool_call_message: Request to execute a tool
  • tool_return_message: Result from tool execution
  • stop_reason: Indicates end of response (end_turn)
  • usage_statistics: Token usage and step count metrics
// With reasoning (default) - includes reasoning_message events
const agent = await client.agents.create({
model: "openai/gpt-4o-mini",
// reasoning: true is the default
});
// Without reasoning - no reasoning_message events
const agentNoReasoning = await client.agents.create({
model: "openai/gpt-4o-mini",
reasoning: false, // Disable reasoning messages
});

Step streaming delivers complete messages after each agent step completes. This is the default behavior when you use the streaming endpoint.

  1. Agent processes your request through steps (reasoning, tool calls, generating responses)
  2. After each step completes, you receive a complete LettaMessage via SSE
  3. Each message can be processed immediately without reassembly
typescript
import { LettaClient } from "@letta-ai/letta-client";
import type { LettaMessage } from "@letta-ai/letta-client/api/types";
const client = new LettaClient({ token: "YOUR_API_KEY" });
const stream = await client.agents.messages.createStream(agent.id, {
messages: [{ role: "user", content: "What's 2+2?" }],
});
for await (const chunk of stream as AsyncIterable<LettaMessage>) {
if (chunk.messageType === "reasoning_message") {
console.log(`Thinking: ${(chunk as any).reasoning}`);
} else if (chunk.messageType === "assistant_message") {
console.log(`Response: ${(chunk as any).content}`);
}
}
data: {"id":"msg-123","message_type":"reasoning_message","reasoning":"User is asking a simple math question."}
data: {"id":"msg-456","message_type":"assistant_message","content":"2 + 2 equals 4!"}
data: {"message_type":"stop_reason","stop_reason":"end_turn"}
data: {"message_type":"usage_statistics","completion_tokens":50,"total_tokens":2821}
data: [DONE]

Token streaming provides partial content chunks as they’re generated by the LLM, enabling a ChatGPT-like experience where text appears character by character.

  1. Set stream_tokens: true in your request
  2. Receive multiple chunks with the same message ID
  3. Each chunk contains a piece of the content
  4. Client must accumulate chunks by ID to rebuild complete messages
typescript
import { LettaClient } from "@letta-ai/letta-client";
import type { LettaMessage } from "@letta-ai/letta-client/api/types";
const client = new LettaClient({ token: "YOUR_API_KEY" });
// Token streaming with reassembly
interface MessageAccumulator {
type: string;
content: string;
}
const messageAccumulators = new Map<string, MessageAccumulator>();
const stream = await client.agents.messages.createStream(agent.id, {
messages: [{ role: "user", content: "Tell me a joke" }],
streamTokens: true, // Note: camelCase
});
for await (const chunk of stream as AsyncIterable<LettaMessage>) {
if (chunk.id && chunk.messageType) {
const msgId = chunk.id;
const msgType = chunk.messageType;
// Initialize accumulator for new messages
if (!messageAccumulators.has(msgId)) {
messageAccumulators.set(msgId, {
type: msgType,
content: "",
});
}
// Accumulate content based on message type
const acc = messageAccumulators.get(msgId)!;
// Only accumulate if the type matches (in case types share IDs)
if (acc.type === msgType) {
if (msgType === "reasoning_message") {
acc.content += (chunk as any).reasoning || "";
} else if (msgType === "assistant_message") {
acc.content += (chunk as any).content || "";
}
}
// Update UI with accumulated content
process.stdout.write(acc.content);
}
}
# Same ID across chunks of the same message
data: {"id":"msg-abc","message_type":"assistant_message","content":"Why"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" did"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" the"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" scarecrow"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" win"}
# ... more chunks with same ID
data: [DONE]

The accumulator pattern shown above works for both streaming modes:

  • Step streaming: Each message is complete (single chunk per ID)
  • Token streaming: Multiple chunks per ID need accumulation

This means you can write your client code once to handle both cases.

All streaming responses follow the Server-Sent Events (SSE) format:

  • Each event starts with data: followed by JSON
  • Stream ends with data: [DONE]
  • Empty lines separate events

Learn more about SSE format here.

If your Letta server connects to multiple LLM providers, some may not support token streaming. Your client code will still work - the server will fall back to step streaming automatically when token streaming isn’t available.