Extractors
Extractors select what content to evaluate from an agent’s response. They navigate the conversation trajectory and extract the specific piece you want to grade.
Common patterns:
last_assistant- Most common, gets the agent’s final message (90% of use cases)tool_arguments- Verify agent called the right tool with correct argsmemory_block- Check if agent updated memory correctlypattern- Extract structured data with regex
Extractors determine what part of the agent’s response gets graded. They pull out specific content from the conversation trajectory.
Why Extractors?
Section titled “Why Extractors?”An agent’s response is complex - it includes assistant messages, tool calls, tool returns, memory updates, etc. Extractors let you focus on exactly what you want to evaluate.
The evaluation flow:
Agent Response → Extractor → Submission Text → Grader → ScoreFor example:
Full trajectory: UserMessage: "What's the capital of France?" ToolCallMessage: search(query="capital of france") ToolReturnMessage: "Paris is the capital..." AssistantMessage: "The capital of France is Paris."
↓ extractor: last_assistant ↓
Extracted: "The capital of France is Paris."
↓ grader: contains (ground_truth="Paris") ↓
Score: 1.0Trajectory Structure
Section titled “Trajectory Structure”A trajectory is a list of turns, where each turn is a list of Letta messages:
[ [UserMessage(...), AssistantMessage(...), ToolCallMessage(...), ToolReturnMessage(...)], # Turn 1 [AssistantMessage(...)] # Turn 2]Extractors navigate this structure to pull out the submission text.
Built-in Extractors
Section titled “Built-in Extractors”last_assistant
Section titled “last_assistant”Extracts the last assistant message content.
graders: quality: kind: tool function: contains extractor: last_assistant # Extract final agent messageMost common extractor - gets the agent’s final response.
first_assistant
Section titled “first_assistant”Extracts the first assistant message content.
graders: initial_response: kind: tool function: contains extractor: first_assistant # Extract first agent messageUseful for testing immediate responses before tool usage.
all_assistant
Section titled “all_assistant”Concatenates all assistant messages with a separator.
graders: complete_response: kind: rubric prompt_path: rubric.txt extractor: all_assistant # Concatenate all agent messages extractor_config: separator: "\n\n" # Join messages with double newlineUse when you need the full conversation context.
last_turn
Section titled “last_turn”Extracts all assistant messages from the last turn only.
graders: final_turn: kind: tool function: contains extractor: last_turn # Messages from final turn only extractor_config: separator: " " # Join with spacesUseful when the agent makes multiple statements in the final turn.
pattern
Section titled “pattern”Extracts content matching a regex pattern from assistant messages.
graders: extract_number: kind: tool function: exact_match extractor: pattern # Extract using regex extractor_config: pattern: 'Result: (\d+)' # Regex pattern to match group: 1 # Extract capture group 1 search_all: false # Only find first matchExample: Extract “42” from “The answer is Result: 42”
tool_arguments
Section titled “tool_arguments”Extracts arguments from a specific tool call.
graders: search_query: kind: tool function: contains extractor: tool_arguments # Extract tool call arguments extractor_config: tool_name: search # Which tool to extract fromReturns the JSON arguments as a string.
Example: If agent calls search(query="pandas", limit=10), extracts:
{ "query": "pandas", "limit": 10 }tool_output
Section titled “tool_output”Extracts the return value from a specific tool call.
graders: search_results: kind: tool function: contains extractor: tool_output # Extract tool return value extractor_config: tool_name: search # Which tool's output to extractFinds the tool call and its corresponding return message.
after_marker
Section titled “after_marker”Extracts content after a specific marker string.
graders: answer_section: kind: tool function: contains extractor: after_marker # Extract content after marker extractor_config: marker: "ANSWER:" # Marker string to find include_marker: false # Don't include "ANSWER:" in outputExample: From “Here’s my analysis… ANSWER: Paris”, extracts “Paris”
memory_block
Section titled “memory_block”Extracts content from a specific memory block (requires agent_state).
graders: human_memory: kind: tool function: exact_match extractor: memory_block # Extract from agent memory extractor_config: block_label: human # Which memory block to extractExample use case: Verify the agent correctly updated its memory about the user.
Extractor Configuration
Section titled “Extractor Configuration”Some extractors accept additional configuration via extractor_config:
graders: my_metric: kind: tool function: contains extractor: pattern # Use pattern extractor extractor_config: # Configuration for this extractor pattern: "Answer: (.*)" # Regex pattern group: 1 # Extract capture group 1Choosing an Extractor
Section titled “Choosing an Extractor”| Use Case | Recommended Extractor |
|---|---|
| Final agent response | last_assistant |
| First response before tools | first_assistant |
| Complete conversation | all_assistant |
| Specific format extraction | pattern |
| Tool usage validation | tool_arguments |
| Tool result checking | tool_output |
| Memory validation | memory_block |
| Structured output | after_marker |
Content Flattening
Section titled “Content Flattening”Assistant messages can contain multiple content parts. Extractors automatically flatten complex content to plain text.
Empty Extraction
Section titled “Empty Extraction”If an extractor finds no matching content, it returns an empty string "". This typically results in a score of 0.0 from the grader.
Custom Extractors
Section titled “Custom Extractors”You can write custom extractors. See Custom Extractors for details.
Example:
from letta_evals.decorators import extractorfrom letta_client import LettaMessageUnion
@extractordef my_extractor(trajectory: List[List[LettaMessageUnion]], config: dict) -> str: # Custom extraction logic return extracted_textRegister by importing in your suite’s setup script or custom evaluators file.
Multi-Metric Extraction
Section titled “Multi-Metric Extraction”Different graders can use different extractors:
graders: response_quality: # Evaluate final message quality kind: rubric prompt_path: quality.txt extractor: last_assistant # Extract final response
tool_usage: # Check tool was called correctly kind: tool function: exact_match extractor: tool_arguments # Extract tool args extractor_config: tool_name: search # From search tool
memory_update: # Verify memory updated kind: tool function: contains extractor: memory_block # Extract from memory extractor_config: block_label: human # Human memory blockEach grader independently extracts and evaluates different aspects.
Listing Extractors
Section titled “Listing Extractors”See all available extractors:
letta-evals list-extractorsExamples
Section titled “Examples”Extract Final Answer
Section titled “Extract Final Answer”extractor: last_assistant # Get final agent messageAgent: “Let me search… uses tool … The answer is Paris.” Extracted: “The answer is Paris.”
Extract Tool Arguments
Section titled “Extract Tool Arguments”extractor: tool_arguments # Get tool call argsextractor_config: tool_name: search # From search toolAgent calls: search(query="pandas", limit=5)
Extracted: {"query": "pandas", "limit": 5}
Extract Pattern
Section titled “Extract Pattern”extractor: pattern # Extract with regexextractor_config: pattern: 'RESULT: (\w+)' # Match pattern group: 1 # Extract capture group 1Agent: “After calculation… RESULT: SUCCESS” Extracted: “SUCCESS”
Extract Memory
Section titled “Extract Memory”extractor: memory_block # Extract from agent memoryextractor_config: block_label: human # Human memory blockAgent updates memory block “human” to: “User’s name is Alice” Extracted: “User’s name is Alice”
Troubleshooting
Section titled “Troubleshooting”Next Steps
Section titled “Next Steps”- Built-in Extractors Reference - Complete extractor documentation
- Custom Extractors Guide - Write your own extractors
- Graders - How to use extractors with graders