Memory Blocks
Deep dive into memory block structure
Letta agents solve the context window limitation of LLMs through context engineering across two tiers of memory: in-context (core) memory (including system instructions, read-write memory blocks, and conversation history), and out-of-context memory (older evicted conversation history and archival storage).
To learn more about the research origins, read the MemGPT research paper, or take the free LLM OS course on DeepLearning.ai.
graph LR
subgraph CONTEXT[Context Window]
SYS[System Instructions]
CORE[Memory Blocks]
MSGS[Messages]
end
RECALL[Recall Memory]
ARCH[Archival Memory]
CONTEXT <--> RECALL
CONTEXT <--> ARCH
Your agent’s context window contains:
When the context window fills up:
conversation_search toolarchival_memory_search toolLetta’s agent architecture follows modern LLM patterns:
This architecture is optimized for frontier models like GPT-5 and Claude Sonnet 4.5.
Learn more about the architecture evolution →
Letta agents have tools to manage their own memory:
memory_insert - Insert text into a memory blockmemory_replace - Replace specific text in a memory blockmemory_rethink - Completely rewrite a memory blockconversation_search - Search prior conversation historyarchival_memory_insert - Store facts and knowledge long-termarchival_memory_search - Query semantic storageLearn more about memory tools →
Agents are created with memory blocks that define their persistent context:
import { LettaClient } from "@letta-ai/letta-client";
const client = new LettaClient({ token: "LETTA_API_KEY" });
const agent = await client.agents.create({ model: "openai/gpt-4o-mini", embedding: "openai/text-embedding-3-small", memoryBlocks: [ { label: "human", value: "The human's name is Chad. They like vibe coding.", }, { label: "persona", value: "My name is Sam, the all-knowing sentient AI.", }, ], tools: ["web_search", "run_code"],});from letta_client import Letta
client = Letta(token="LETTA_API_KEY")
agent = client.agents.create( model="openai/gpt-4o-mini", embedding="openai/text-embedding-3-small", memory_blocks=[ { "label": "human", "value": "The human's name is Chad. They like vibe coding." }, { "label": "persona", "value": "My name is Sam, the all-knowing sentient AI." } ], tools=["web_search", "run_code"])curl -X POST https://api.letta.com/v1/agents \ -H "Authorization: Bearer $LETTA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4o-mini", "embedding": "openai/text-embedding-3-small", "memory_blocks": [ { "label": "human", "value": "The human'\''s name is Chad. They like vibe coding." }, { "label": "persona", "value": "My name is Sam, the all-knowing sentient AI." } ], "tools": ["web_search", "run_code"]}'When the context window fills up, Letta automatically:
conversation_search toolThis happens transparently - your agent maintains continuity.
Agents can insert memories during conversations, or you can populate archival memory programmatically:
// Insert a memory via SDKawait client.agents.passages.insert(agent.id, { content: "The user prefers TypeScript over JavaScript for type safety.", tags: ["preferences", "languages"],});
// Agent can now search this// Agent calls: archival_memory_search(query="language preferences")# Insert a memory via SDKclient.agents.passages.insert( agent_id=agent.id, content="The user prefers TypeScript over JavaScript for type safety.", tags=["preferences", "languages"])
# Agent can now search this# Agent calls: archival_memory_search(query="language preferences")Learn more about archival memory →
Key concepts from the MemGPT research:
Read the MemGPT paper → Take the free course →
Memory Blocks
Deep dive into memory block structure
Archival Memory
Long-term semantic storage
Base Tools
Built-in tools for memory management
Context Engineering
Optimizing agent memory usage