Agent Memory & Architecture

Letta agents solve the context window limitation of LLMs through context engineering across two tiers of memory: in-context (core) memory (including system instructions, read-write memory blocks, and conversation history), and out-of-context memory (older evicted conversation history and archival storage).

To learn more about the research origins, read the MemGPT research paper, or take the free LLM OS course on DeepLearning.ai.

Memory Hierarchy

graph LR
    subgraph CONTEXT[Context Window]
        SYS[System Instructions]
        CORE[Memory Blocks]
        MSGS[Messages]
    end

    RECALL[Recall Memory]
    ARCH[Archival Memory]

    CONTEXT <--> RECALL
    CONTEXT <--> ARCH

In-context (core) memory

Your agent’s context window contains:

System instructions: Your agent’s base behavior and capabilities
Memory blocks: Persistent, always-visible information (persona, user info, working state, etc.)
Recent messages: Latest conversation history

Out-of-context memory

When the context window fills up:

Recall memory: Older messages searchable via conversation_search tool
Archival memory: Long-term semantic storage searchable via archival_memory_search tool

Agent Architecture

Letta’s agent architecture follows modern LLM patterns:

Native reasoning: Uses model’s built-in reasoning capabilities (Responses API for OpenAI, encrypted reasoning for other providers)
Direct messaging: Agents respond with assistant messages
Compatibility: Works with any LLM, tool calling not required
Self-directed termination: Agents decide when to continue or stop

This architecture is optimized for frontier models like GPT-5 and Claude Sonnet 4.5.

Learn more about the architecture evolution →

Memory Tools

Letta agents have tools to manage their own memory:

Memory block editing

memory_insert - Insert text into a memory block
memory_replace - Replace specific text in a memory block
memory_rethink - Completely rewrite a memory block

Recall memory

conversation_search - Search prior conversation history

Archival memory

archival_memory_insert - Store facts and knowledge long-term
archival_memory_search - Query semantic storage

Learn more about memory tools →

Creating Agents

Agents are created with memory blocks that define their persistent context:

import { LettaClient } from "@letta-ai/letta-client";

const client = new LettaClient({ token: "LETTA_API_KEY" });

const agent = await client.agents.create({
  model: "openai/gpt-4o-mini",
  embedding: "openai/text-embedding-3-small",
  memoryBlocks: [
    {
      label: "human",
      value: "The human's name is Chad. They like vibe coding.",
    },
    {
      label: "persona",
      value: "My name is Sam, the all-knowing sentient AI.",
    },
  ],
  tools: ["web_search", "run_code"],
});

from letta_client import Letta

client = Letta(token="LETTA_API_KEY")

agent = client.agents.create(
    model="openai/gpt-4o-mini",
    embedding="openai/text-embedding-3-small",
    memory_blocks=[
        {
          "label": "human",
          "value": "The human's name is Chad. They like vibe coding."
        },
        {
          "label": "persona",
          "value": "My name is Sam, the all-knowing sentient AI."
        }
    ],
    tools=["web_search", "run_code"]
)

curl -X POST https://api.letta.com/v1/agents \
     -H "Authorization: Bearer $LETTA_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
  "model": "openai/gpt-4o-mini",
  "embedding": "openai/text-embedding-3-small",
  "memory_blocks": [
    {
      "label": "human",
      "value": "The human'\''s name is Chad. They like vibe coding."
    },
    {
      "label": "persona",
      "value": "My name is Sam, the all-knowing sentient AI."
    }
  ],
  "tools": ["web_search", "run_code"]
}'

Context Window Management

When the context window fills up, Letta automatically:

Compacts older messages into a recursive summary
Moves full message history to recall storage
Agent can search recall with conversation_search tool

This happens transparently - your agent maintains continuity.

Populating Archival Memory

Agents can insert memories during conversations, or you can populate archival memory programmatically:

TypeScript
Python

// Insert a memory via SDK
await client.agents.passages.insert(agent.id, {
  content: "The user prefers TypeScript over JavaScript for type safety.",
  tags: ["preferences", "languages"],
});

// Agent can now search this
// Agent calls: archival_memory_search(query="language preferences")

# Insert a memory via SDK
client.agents.passages.insert(
    agent_id=agent.id,
    content="The user prefers TypeScript over JavaScript for type safety.",
    tags=["preferences", "languages"]
)

# Agent can now search this
# Agent calls: archival_memory_search(query="language preferences")

Learn more about archival memory →

Research Background

Key concepts from the MemGPT research:

Self-editing memory: Agents actively manage their own memory
Memory hierarchy: In-context vs out-of-context storage
Tool-based memory management: Agents decide what to remember
Stateful agents: Persistent memory across all interactions

Read the MemGPT paper → Take the free course →

Next Steps

Memory Blocks

Deep dive into memory block structure

Archival Memory

Long-term semantic storage

Base Tools

Built-in tools for memory management

Context Engineering

Optimizing agent memory usage