Skip to content
  • Auto
  • Light
  • Dark
DiscordForumGitHubSign up
Getting started
View as Markdown
Copy Markdown

Open in Claude
Open in ChatGPT

Core Concepts

Large language models are stateless by design. An LLM’s knowledge comes from two sources:

  1. Model weights - Fixed after training
  2. Context window - Ephemeral input provided at inference time

This means LLMs have no persistent memory between interactions. Each API call starts from scratch, with no ability to learn from past experiences or maintain state across sessions.

Stateful agents overcome this limitation by maintaining persistent memory and identity across all interactions.

A stateful agent has:

  • Persistent identity - Exists as a unique entity with continuity across sessions
  • Active memory formation - Autonomously decides what information to store and update
  • Accumulated state - Learns through experience rather than just model weights
  • Long-term context - Maintains knowledge beyond single conversation windows

Unlike traditional LLM applications where your code manages state, stateful agents actively manage their own memory using built-in tools to read, write, and search their persistent storage.

Traditional LLM applications are stateless - every interaction starts from scratch. Your application must:

  • Store all conversation history in your own database
  • Send the entire context with every API call
  • Implement memory and personalization logic yourself
  • Manually manage context window limits

With Letta’s stateful agents, all of this is handled for you. The agent maintains its own persistent state, intelligently manages its context window, and learns from every interaction without requiring you to build a complex state management layer.

The difference between stateful agents and traditional LLM APIs is fundamental:

Traditional APIs (stateless): No memory between requests. Your app manages everything.

Letta (stateful): Agents maintain their own persistent state. You only send new messages.

With stateless APIs, there is no state persistence between requests. The client must send the entire conversation history with every call.

flowchart LR
    Client["Client Application"]
    API["LLM API
(OpenAI, Anthropic, etc)"] Client -->|"Send: msg1"| API API -->|"Return: response1"| Client

The client must send the full conversation history with each request:

  • Request 2: [msg1, response1, msg2]
  • Request 3: [msg1, response1, msg2, response2, msg3]

Letta maintains agent state on the server and persists it to a database. Clients only send new messages, and the server handles all state management.

flowchart LR
    Client["Client Application"]
    Server["Letta Server"]
    DB[("Persistent
Database")] Client -->|"Send: msg1"| Server Server <-->|"Load/Save State"| DB Server -->|"Return: response1"| Client

The client only sends new messages:

  • Request 2: [msg2]
  • Request 3: [msg3]
AspectTraditional (Stateless)Letta (Stateful)
State managementClient-sideServer-side
Request formatSend full conversation historySend only new messages
MemoryNone (ephemeral)Persistent database
Context limitHard limit, then failsIntelligent management
Agent identityNoneEach agent has unique ID
Long conversationsExpensive & brittleScales infinitely
PersonalizationApp must manageBuilt-in memory blocks
Multi-sessionRequires external DBNative support

Stateless API (e.g., OpenAI):

# You must send the entire conversation every time
messages = [
{"role": "user", "content": "Hello, I'm Sarah"},
{"role": "assistant", "content": "Hi Sarah!"},
{"role": "user", "content": "What's my name?"}, # ← New message
]
# Send everything
response = openai.chat.completions.create(
model="gpt-4",
messages=messages # ← Full history required
)
# You must store and manage messages yourself
messages.append(response.choices[0].message)

Stateful API (Letta):

# Agent already knows context
response = client.agents.messages.send(
agent.id, input="What's my name?" # ← New message only
)
# Agent remembers Sarah from its memory blocks
# No need to send previous messages

Letta treats agents as persistent services, not ephemeral library calls.

In traditional frameworks, agents are objects that live in your application’s memory and disappear when your app stops. In Letta, agents are independent services that:

  • Continue to exist when your application isn’t running
  • Maintain state in a database
  • Can be accessed from multiple applications simultaneously
  • Run autonomously on the server

You interact with Letta agents through REST APIs:

POST /agents/{agent_id}/messages

This architecture enables:

  • Multi-user applications - Each user gets their own persistent agent
  • Agent-to-agent communication - Agents can message each other
  • Background processing - Agents can continue working while your app is offline
  • Deployment flexibility - Scale agents independently from your application

In Letta, all state is persisted automatically:

  • Agent memory (both memory blocks and archival)
  • Message history
  • Tool configurations
  • Agent state and context

Because everything is persisted:

  • Agents can be paused and resumed at any time
  • You can reload agents across different machines
  • State is never lost due to application restarts
  • Long conversations don’t degrade performance

Unlike RAG systems that passively retrieve documents, Letta agents actively manage their own memory. Agents use built-in tools to:

  • Edit their memory blocks when learning new information
  • Insert facts into archival memory for long-term storage
  • Search their past conversations when context is needed

This enables agents to:

  • Learn user preferences over time
  • Maintain consistent personality across sessions
  • Build long-term relationships with users
  • Continuously improve from interactions

Learn more about memory →

Letta doesn’t have the concept of threads or sessions. Instead, there are only stateful agents with a single perpetual message history.

%%{init: {'flowchart': {'rankDir': 'LR'}}}%%
flowchart LR
    subgraph Traditional["Thread-Based Agents"]
        direction TB
        llm1[LLM] --> thread1["Thread 1
        --------
        Ephemeral
        Session"]
        llm1 --> thread2["Thread 2
        --------
        Ephemeral
        Session"]
        llm1 --> thread3["Thread 3
        --------
        Ephemeral
        Session"]
    end

    Traditional ~~~ Letta

    subgraph Letta["Letta Stateful Agents"]
        direction TB
        llm2[LLM] --> agent["Single Agent
        --------
        Persistent Memory"]
        agent --> db[(PostgreSQL)]
        db -->|"Learn & Update"| agent
    end

    class thread1,thread2,thread3 session
    class agent agent

Why no threads? Letta is built on the principle that all interactions should be part of persistent memory, not ephemeral sessions. This enables:

  • Continuous learning across all conversations
  • True long-term memory and relationships
  • No context loss when “starting a new thread”

For multi-user applications, we recommend creating one agent per user. Each agent maintains its own persistent memory about that specific user.

If you need conversation templates or starting points, use agent templates to create new agents with pre-configured state.

The LLM Operating System is the infrastructure layer that manages agent execution, state, and memory. This includes:

  • Agent runtime - Manages tool execution and the reasoning loop
  • Memory layer - Handles context window management and persistence
  • Stateful layer - Coordinates state across database, cache, and execution

Letta’s architecture is inspired by the MemGPT research paper, which introduced these concepts.

The path to more capable AI systems isn’t just about larger models or longer context windows. Stateful agents represent a fundamental shift: agents that learn through accumulated experience, build lasting relationships with users, and continuously improve without retraining.

With stateful agents, you can build:

  • Personalized assistants that adapt to individual users over time
  • Learning systems that improve from feedback and interactions
  • Long-term relationships where agents develop deep context about users and tasks
  • Autonomous services that operate independently and maintain their own knowledge

This architectural shift—from stateless function calls to stateful agent services—enables a new class of AI applications that weren’t possible with traditional LLM APIs.