Skip to content
  • Auto
  • Light
  • Dark
DiscordForumGitHubSign up
Development Tools
Testing & evals
Core concepts
View as Markdown
Copy Markdown

Open in Claude
Open in ChatGPT

Extractors

Extractors select what content to evaluate from an agent’s response. They navigate the conversation trajectory and extract the specific piece you want to grade.

Common patterns:

  • last_assistant - Most common, gets the agent’s final message (90% of use cases)
  • tool_arguments - Verify agent called the right tool with correct args
  • memory_block - Check if agent updated memory correctly
  • pattern - Extract structured data with regex

Extractors determine what part of the agent’s response gets graded. They pull out specific content from the conversation trajectory.

An agent’s response is complex - it includes assistant messages, tool calls, tool returns, memory updates, etc. Extractors let you focus on exactly what you want to evaluate.

The evaluation flow:

Agent Response → Extractor → Submission Text → Grader → Score

For example:

Full trajectory:
UserMessage: "What's the capital of France?"
ToolCallMessage: search(query="capital of france")
ToolReturnMessage: "Paris is the capital..."
AssistantMessage: "The capital of France is Paris."
↓ extractor: last_assistant ↓
Extracted: "The capital of France is Paris."
↓ grader: contains (ground_truth="Paris") ↓
Score: 1.0

A trajectory is a list of turns, where each turn is a list of Letta messages:

[
[UserMessage(...), AssistantMessage(...), ToolCallMessage(...), ToolReturnMessage(...)], # Turn 1
[AssistantMessage(...)] # Turn 2
]

Extractors navigate this structure to pull out the submission text.

Extracts the last assistant message content.

graders:
quality:
kind: tool
function: contains
extractor: last_assistant # Extract final agent message

Most common extractor - gets the agent’s final response.

Extracts the first assistant message content.

graders:
initial_response:
kind: tool
function: contains
extractor: first_assistant # Extract first agent message

Useful for testing immediate responses before tool usage.

Concatenates all assistant messages with a separator.

graders:
complete_response:
kind: rubric
prompt_path: rubric.txt
extractor: all_assistant # Concatenate all agent messages
extractor_config:
separator: "\n\n" # Join messages with double newline

Use when you need the full conversation context.

Extracts all assistant messages from the last turn only.

graders:
final_turn:
kind: tool
function: contains
extractor: last_turn # Messages from final turn only
extractor_config:
separator: " " # Join with spaces

Useful when the agent makes multiple statements in the final turn.

Extracts content matching a regex pattern from assistant messages.

graders:
extract_number:
kind: tool
function: exact_match
extractor: pattern # Extract using regex
extractor_config:
pattern: 'Result: (\d+)' # Regex pattern to match
group: 1 # Extract capture group 1
search_all: false # Only find first match

Example: Extract “42” from “The answer is Result: 42”

Extracts arguments from a specific tool call.

graders:
search_query:
kind: tool
function: contains
extractor: tool_arguments # Extract tool call arguments
extractor_config:
tool_name: search # Which tool to extract from

Returns the JSON arguments as a string.

Example: If agent calls search(query="pandas", limit=10), extracts:

{ "query": "pandas", "limit": 10 }

Extracts the return value from a specific tool call.

graders:
search_results:
kind: tool
function: contains
extractor: tool_output # Extract tool return value
extractor_config:
tool_name: search # Which tool's output to extract

Finds the tool call and its corresponding return message.

Extracts content after a specific marker string.

graders:
answer_section:
kind: tool
function: contains
extractor: after_marker # Extract content after marker
extractor_config:
marker: "ANSWER:" # Marker string to find
include_marker: false # Don't include "ANSWER:" in output

Example: From “Here’s my analysis… ANSWER: Paris”, extracts “Paris”

Extracts content from a specific memory block (requires agent_state).

graders:
human_memory:
kind: tool
function: exact_match
extractor: memory_block # Extract from agent memory
extractor_config:
block_label: human # Which memory block to extract

Example use case: Verify the agent correctly updated its memory about the user.

Some extractors accept additional configuration via extractor_config:

graders:
my_metric:
kind: tool
function: contains
extractor: pattern # Use pattern extractor
extractor_config: # Configuration for this extractor
pattern: "Answer: (.*)" # Regex pattern
group: 1 # Extract capture group 1
Use CaseRecommended Extractor
Final agent responselast_assistant
First response before toolsfirst_assistant
Complete conversationall_assistant
Specific format extractionpattern
Tool usage validationtool_arguments
Tool result checkingtool_output
Memory validationmemory_block
Structured outputafter_marker

Assistant messages can contain multiple content parts. Extractors automatically flatten complex content to plain text.

If an extractor finds no matching content, it returns an empty string "". This typically results in a score of 0.0 from the grader.

You can write custom extractors. See Custom Extractors for details.

Example:

from letta_evals.decorators import extractor
from letta_client import LettaMessageUnion
@extractor
def my_extractor(trajectory: List[List[LettaMessageUnion]], config: dict) -> str:
# Custom extraction logic
return extracted_text

Register by importing in your suite’s setup script or custom evaluators file.

Different graders can use different extractors:

graders:
response_quality: # Evaluate final message quality
kind: rubric
prompt_path: quality.txt
extractor: last_assistant # Extract final response
tool_usage: # Check tool was called correctly
kind: tool
function: exact_match
extractor: tool_arguments # Extract tool args
extractor_config:
tool_name: search # From search tool
memory_update: # Verify memory updated
kind: tool
function: contains
extractor: memory_block # Extract from memory
extractor_config:
block_label: human # Human memory block

Each grader independently extracts and evaluates different aspects.

See all available extractors:

Terminal window
letta-evals list-extractors
extractor: last_assistant # Get final agent message

Agent: “Let me search… uses tool … The answer is Paris.” Extracted: “The answer is Paris.”

extractor: tool_arguments # Get tool call args
extractor_config:
tool_name: search # From search tool

Agent calls: search(query="pandas", limit=5) Extracted: {"query": "pandas", "limit": 5}

extractor: pattern # Extract with regex
extractor_config:
pattern: 'RESULT: (\w+)' # Match pattern
group: 1 # Extract capture group 1

Agent: “After calculation… RESULT: SUCCESS” Extracted: “SUCCESS”

extractor: memory_block # Extract from agent memory
extractor_config:
block_label: human # Human memory block

Agent updates memory block “human” to: “User’s name is Alice” Extracted: “User’s name is Alice”