Skip to content
  • Auto
  • Light
  • Dark
DiscordForumGitHubSign up

Agent Memory & Architecture

Letta agents solve the context window limitation of LLMs through context engineering across two tiers of memory: in-context (core) memory (including system instructions, read-write memory blocks, and conversation history), and out-of-context memory (older evicted conversation history and archival storage).

To learn more about the research origins, read the MemGPT research paper, or take the free LLM OS course on DeepLearning.ai.

graph LR
    subgraph CONTEXT[Context Window]
        SYS[System Instructions]
        CORE[Memory Blocks]
        MSGS[Messages]
    end

    RECALL[Recall Memory]
    ARCH[Archival Memory]

    CONTEXT <--> RECALL
    CONTEXT <--> ARCH

Your agent’s context window contains:

  • System instructions: Your agent’s base behavior and capabilities
  • Memory blocks: Persistent, always-visible information (persona, user info, working state, etc.)
  • Recent messages: Latest conversation history

When the context window fills up:

  • Recall memory: Older messages searchable via conversation_search tool
  • Archival memory: Long-term semantic storage searchable via archival_memory_search tool

Letta’s agent architecture follows modern LLM patterns:

  • Native reasoning: Uses model’s built-in reasoning capabilities (Responses API for OpenAI, encrypted reasoning for other providers)
  • Direct messaging: Agents respond with assistant messages
  • Compatibility: Works with any LLM, tool calling not required
  • Self-directed termination: Agents decide when to continue or stop

This architecture is optimized for frontier models like GPT-5 and Claude Sonnet 4.5.

Learn more about the architecture evolution →

Letta agents have tools to manage their own memory:

  • memory_insert - Insert text into a memory block
  • memory_replace - Replace specific text in a memory block
  • memory_rethink - Completely rewrite a memory block
  • conversation_search - Search prior conversation history
  • archival_memory_insert - Store facts and knowledge long-term
  • archival_memory_search - Query semantic storage

Learn more about memory tools →

Agents are created with memory blocks that define their persistent context:

import { LettaClient } from "@letta-ai/letta-client";
const client = new LettaClient({ token: "LETTA_API_KEY" });
const agent = await client.agents.create({
model: "openai/gpt-4o-mini",
embedding: "openai/text-embedding-3-small",
memoryBlocks: [
{
label: "human",
value: "The human's name is Chad. They like vibe coding.",
},
{
label: "persona",
value: "My name is Sam, the all-knowing sentient AI.",
},
],
tools: ["web_search", "run_code"],
});

When the context window fills up, Letta automatically:

  1. Compacts older messages into a recursive summary
  2. Moves full message history to recall storage
  3. Agent can search recall with conversation_search tool

This happens transparently - your agent maintains continuity.

Agents can insert memories during conversations, or you can populate archival memory programmatically:

// Insert a memory via SDK
await client.agents.passages.insert(agent.id, {
content: "The user prefers TypeScript over JavaScript for type safety.",
tags: ["preferences", "languages"],
});
// Agent can now search this
// Agent calls: archival_memory_search(query="language preferences")

Learn more about archival memory →

Key concepts from the MemGPT research:

  • Self-editing memory: Agents actively manage their own memory
  • Memory hierarchy: In-context vs out-of-context storage
  • Tool-based memory management: Agents decide what to remember
  • Stateful agents: Persistent memory across all interactions

Read the MemGPT paper → Take the free course →

Memory Blocks

Deep dive into memory block structure

Archival Memory

Long-term semantic storage

Base Tools

Built-in tools for memory management

Context Engineering

Optimizing agent memory usage