Core Concepts
The Fundamental Limitation of LLMs
Section titled “The Fundamental Limitation of LLMs”Large language models are stateless by design. An LLM’s knowledge comes from two sources:
- Model weights - Fixed after training
- Context window - Ephemeral input provided at inference time
This means LLMs have no persistent memory between interactions. Each API call starts from scratch, with no ability to learn from past experiences or maintain state across sessions.
What are Stateful Agents?
Section titled “What are Stateful Agents?”Stateful agents overcome this limitation by maintaining persistent memory and identity across all interactions.
A stateful agent has:
- Persistent identity - Exists as a unique entity with continuity across sessions
- Active memory formation - Autonomously decides what information to store and update
- Accumulated state - Learns through experience rather than just model weights
- Long-term context - Maintains knowledge beyond single conversation windows
Unlike traditional LLM applications where your code manages state, stateful agents actively manage their own memory using built-in tools to read, write, and search their persistent storage.
Why Statefulness Matters
Section titled “Why Statefulness Matters”Traditional LLM applications are stateless - every interaction starts from scratch. Your application must:
- Store all conversation history in your own database
- Send the entire context with every API call
- Implement memory and personalization logic yourself
- Manually manage context window limits
With Letta’s stateful agents, all of this is handled for you. The agent maintains its own persistent state, intelligently manages its context window, and learns from every interaction without requiring you to build a complex state management layer.
Stateful vs Stateless APIs
Section titled “Stateful vs Stateless APIs”The difference between stateful agents and traditional LLM APIs is fundamental:
Traditional APIs (stateless): No memory between requests. Your app manages everything.
Letta (stateful): Agents maintain their own persistent state. You only send new messages.
Traditional Stateless API
Section titled “Traditional Stateless API”With stateless APIs, there is no state persistence between requests. The client must send the entire conversation history with every call.
flowchart LR
Client["Client Application"]
API["LLM API
(OpenAI, Anthropic, etc)"]
Client -->|"Send: msg1"| API
API -->|"Return: response1"| Client
The client must send the full conversation history with each request:
- Request 2:
[msg1, response1, msg2] - Request 3:
[msg1, response1, msg2, response2, msg3]
Letta Stateful API
Section titled “Letta Stateful API”Letta maintains agent state on the server and persists it to a database. Clients only send new messages, and the server handles all state management.
flowchart LR
Client["Client Application"]
Server["Letta Server"]
DB[("Persistent
Database")]
Client -->|"Send: msg1"| Server
Server <-->|"Load/Save State"| DB
Server -->|"Return: response1"| Client
The client only sends new messages:
- Request 2:
[msg2] - Request 3:
[msg3]
Key Differences
Section titled “Key Differences”| Aspect | Traditional (Stateless) | Letta (Stateful) |
|---|---|---|
| State management | Client-side | Server-side |
| Request format | Send full conversation history | Send only new messages |
| Memory | None (ephemeral) | Persistent database |
| Context limit | Hard limit, then fails | Intelligent management |
| Agent identity | None | Each agent has unique ID |
| Long conversations | Expensive & brittle | Scales infinitely |
| Personalization | App must manage | Built-in memory blocks |
| Multi-session | Requires external DB | Native support |
Code Comparison
Section titled “Code Comparison”Stateless API (e.g., OpenAI):
# You must send the entire conversation every timemessages = [ {"role": "user", "content": "Hello, I'm Sarah"}, {"role": "assistant", "content": "Hi Sarah!"}, {"role": "user", "content": "What's my name?"}, # ← New message]
# Send everythingresponse = openai.chat.completions.create( model="gpt-4", messages=messages # ← Full history required)
# You must store and manage messages yourselfmessages.append(response.choices[0].message)Stateful API (Letta):
# Agent already knows contextresponse = client.agents.messages.send( agent.id, input="What's my name?" # ← New message only)
# Agent remembers Sarah from its memory blocks# No need to send previous messagesAgents as Services
Section titled “Agents as Services”Letta treats agents as persistent services, not ephemeral library calls.
In traditional frameworks, agents are objects that live in your application’s memory and disappear when your app stops. In Letta, agents are independent services that:
- Continue to exist when your application isn’t running
- Maintain state in a database
- Can be accessed from multiple applications simultaneously
- Run autonomously on the server
You interact with Letta agents through REST APIs:
POST /agents/{agent_id}/messagesThis architecture enables:
- Multi-user applications - Each user gets their own persistent agent
- Agent-to-agent communication - Agents can message each other
- Background processing - Agents can continue working while your app is offline
- Deployment flexibility - Scale agents independently from your application
Persistence by Default
Section titled “Persistence by Default”In Letta, all state is persisted automatically:
- Agent memory (both memory blocks and archival)
- Message history
- Tool configurations
- Agent state and context
Because everything is persisted:
- Agents can be paused and resumed at any time
- You can reload agents across different machines
- State is never lost due to application restarts
- Long conversations don’t degrade performance
Self-Editing Memory
Section titled “Self-Editing Memory”Unlike RAG systems that passively retrieve documents, Letta agents actively manage their own memory. Agents use built-in tools to:
- Edit their memory blocks when learning new information
- Insert facts into archival memory for long-term storage
- Search their past conversations when context is needed
This enables agents to:
- Learn user preferences over time
- Maintain consistent personality across sessions
- Build long-term relationships with users
- Continuously improve from interactions
Agents vs Threads
Section titled “Agents vs Threads”Letta doesn’t have the concept of threads or sessions. Instead, there are only stateful agents with a single perpetual message history.
%%{init: {'flowchart': {'rankDir': 'LR'}}}%%
flowchart LR
subgraph Traditional["Thread-Based Agents"]
direction TB
llm1[LLM] --> thread1["Thread 1
--------
Ephemeral
Session"]
llm1 --> thread2["Thread 2
--------
Ephemeral
Session"]
llm1 --> thread3["Thread 3
--------
Ephemeral
Session"]
end
Traditional ~~~ Letta
subgraph Letta["Letta Stateful Agents"]
direction TB
llm2[LLM] --> agent["Single Agent
--------
Persistent Memory"]
agent --> db[(PostgreSQL)]
db -->|"Learn & Update"| agent
end
class thread1,thread2,thread3 session
class agent agent
Why no threads? Letta is built on the principle that all interactions should be part of persistent memory, not ephemeral sessions. This enables:
- Continuous learning across all conversations
- True long-term memory and relationships
- No context loss when “starting a new thread”
For multi-user applications, we recommend creating one agent per user. Each agent maintains its own persistent memory about that specific user.
If you need conversation templates or starting points, use agent templates to create new agents with pre-configured state.
LLM OS
Section titled “LLM OS”The LLM Operating System is the infrastructure layer that manages agent execution, state, and memory. This includes:
- Agent runtime - Manages tool execution and the reasoning loop
- Memory layer - Handles context window management and persistence
- Stateful layer - Coordinates state across database, cache, and execution
Letta’s architecture is inspired by the MemGPT research paper, which introduced these concepts.
Beyond Model Size
Section titled “Beyond Model Size”The path to more capable AI systems isn’t just about larger models or longer context windows. Stateful agents represent a fundamental shift: agents that learn through accumulated experience, build lasting relationships with users, and continuously improve without retraining.
With stateful agents, you can build:
- Personalized assistants that adapt to individual users over time
- Learning systems that improve from feedback and interactions
- Long-term relationships where agents develop deep context about users and tasks
- Autonomous services that operate independently and maintain their own knowledge
This architectural shift—from stateless function calls to stateful agent services—enables a new class of AI applications that weren’t possible with traditional LLM APIs.