Targets
A target is the agent you’re evaluating. In Letta Evals, the target configuration determines how agents are created, accessed, and tested.
When to use each approach:
agent_file- Pre-configured agents saved as.affiles (most common)agent_id- Testing existing agents or multi-turn conversations with stateagent_script- Dynamic agent creation with per-sample customization
The target configuration specifies how to create or access the agent for evaluation.
Target Configuration
Section titled “Target Configuration”All targets have a kind field (currently only agent is supported):
target: kind: agent # Currently only "agent" is supported # ... agent-specific configurationAgent Sources
Section titled “Agent Sources”You must specify exactly ONE of these:
agent_file
Section titled “agent_file”Path to a .af (Agent File) to upload:
target: kind: agent agent_file: path/to/agent.af # Path to .af file base_url: https://api.letta.com # Letta server URLThe agent file will be uploaded to the Letta server and a new agent created for the evaluation.
agent_id
Section titled “agent_id”ID of an existing agent on the server:
target: kind: agent agent_id: agent-123-abc # ID of existing agent base_url: https://api.letta.com # Letta server URLagent_script
Section titled “agent_script”Path to a Python script with an agent factory function for programmatic agent creation:
target: kind: agent agent_script: create_agent.py:create_inventory_agent # script.py:function_name base_url: https://api.letta.com # Letta server URLFormat: path/to/script.py:function_name
The function must be decorated with @agent_factory and have the signature async (client: AsyncLetta, sample: Sample) -> str:
from letta_client import AsyncLetta, CreateBlockfrom letta_evals.decorators import agent_factoryfrom letta_evals.models import Sample
@agent_factoryasync def create_inventory_agent(client: AsyncLetta, sample: Sample) -> str: """Create and return agent ID for this sample.""" # Access custom arguments from the dataset item = sample.agent_args.get("item", {})
# Create agent with sample-specific configuration agent = await client.agents.create( name="inventory-assistant", memory_blocks=[ CreateBlock( label="item_context", value=f"Item: {item.get('name', 'Unknown')}" ) ], agent_type="letta_v1_agent", model="openai/gpt-4.1-mini", embedding="openai/text-embedding-3-small", )
return agent.idKey features:
- Creates a fresh agent for each sample
- Can customize agents using
sample.agent_argsfrom the dataset - Allows testing agent creation logic itself
- Useful when you don’t have pre-saved agent files
When to use:
- Testing agent creation workflows
- Dynamic per-sample agent configuration
- Agents that need sample-specific memory or tools
- Programmatic agent testing
Connection Configuration
Section titled “Connection Configuration”base_url
Section titled “base_url”Letta server URL:
target: base_url: https://api.letta.com # Local Letta server # or base_url: https://api.letta.com # Letta CloudDefault: https://api.letta.com
api_key
Section titled “api_key”API key for authentication (required for Letta Cloud):
target: api_key: your-api-key-here # Required for Letta CloudOr set via environment variable:
export LETTA_API_KEY=your-api-key-hereproject_id
Section titled “project_id”Letta project ID (for Letta Cloud):
target: project_id: proj_abc123 # Letta Cloud projectOr set via environment variable:
export LETTA_PROJECT_ID=proj_abc123timeout
Section titled “timeout”Request timeout in seconds:
target: timeout: 300.0 # Request timeout (5 minutes)Default: 300 seconds
Multi-Model Evaluation
Section titled “Multi-Model Evaluation”Test the same agent across different models:
model_configs
Section titled “model_configs”List of model configuration names from JSON files:
target: kind: agent agent_file: agent.af model_configs: [gpt-4o-mini, claude-3-5-sonnet] # Test with both modelsThe evaluation will run once for each model config. Model configs are JSON files in letta_evals/llm_model_configs/.
model_handles
Section titled “model_handles”List of model handles (cloud-compatible identifiers):
target: kind: agent agent_file: agent.af model_handles: ["openai/gpt-4o-mini", "anthropic/claude-3-5-sonnet"] # Cloud model identifiersUse this for Letta Cloud deployments.
Complete Examples
Section titled “Complete Examples”Local Development
Section titled “Local Development”target: kind: agent agent_file: ./agents/my_agent.af # Pre-configured agent base_url: https://api.letta.com # Local serverLetta Cloud
Section titled “Letta Cloud”target: kind: agent agent_id: agent-cloud-123 # Existing cloud agent base_url: https://api.letta.com # Letta Cloud api_key: ${LETTA_API_KEY} # From environment variable project_id: proj_abc # Your project IDMulti-Model Testing
Section titled “Multi-Model Testing”target: kind: agent agent_file: agent.af # Same agent configuration base_url: https://api.letta.com # Local server model_configs: [gpt-4o-mini, gpt-4o, claude-3-5-sonnet] # Test 3 modelsResults will include per-model metrics:
Model: gpt-4o-mini - Avg: 0.85, Pass: 85.0%Model: gpt-4o - Avg: 0.92, Pass: 92.0%Model: claude-3-5-sonnet - Avg: 0.88, Pass: 88.0%Programmatic Agent Creation
Section titled “Programmatic Agent Creation”target: kind: agent agent_script: setup.py:CustomAgentFactory # Programmatic creation base_url: https://api.letta.com # Local serverEnvironment Variable Precedence
Section titled “Environment Variable Precedence”Configuration values are resolved in this order (highest priority first):
- CLI arguments (
--api-key,--base-url,--project-id) - Suite YAML configuration
- Environment variables (
LETTA_API_KEY,LETTA_BASE_URL,LETTA_PROJECT_ID)
Agent Lifecycle and Testing Behavior
Section titled “Agent Lifecycle and Testing Behavior”The way your agent is specified fundamentally changes how the evaluation runs:
With agent_file or agent_script: Independent Testing
Section titled “With agent_file or agent_script: Independent Testing”Agent lifecycle:
- A fresh agent instance is created for each sample
- Agent processes the sample input(s)
- Agent remains on the server after the sample completes
Testing behavior: Each sample is an independent, isolated test. Agent state (memory, message history) does not carry over between samples. This enables parallel execution and ensures reproducible results.
Use cases:
- Testing how the agent responds to various independent inputs
- Ensuring consistent behavior across different scenarios
- Regression testing where each case should be isolated
- Evaluating agent responses without prior context
With agent_id: Sequential Script Testing
Section titled “With agent_id: Sequential Script Testing”Agent lifecycle:
- The same agent instance is used for all samples
- Agent processes each sample in sequence
- Agent state persists throughout the entire evaluation
Testing behavior: The dataset becomes a conversation script where each sample builds on previous ones. Agent memory and message history accumulate, and earlier interactions affect later responses. Samples must execute sequentially.
Use cases:
- Testing multi-turn conversations with context
- Evaluating how agent memory evolves over time
- Simulating a single user session with multiple interactions
- Testing scenarios where context should accumulate
Critical Differences
Section titled “Critical Differences”| Aspect | agent_file / agent_script | agent_id |
|---|---|---|
| Agent instances | New agent per sample | Same agent for all samples |
| State isolation | Fully isolated | State carries over |
| Execution | Can run in parallel | Must run sequentially |
| Memory | Fresh for each sample | Accumulates across samples |
| Use case | Independent test cases | Conversation scripts |
| Reproducibility | Highly reproducible | Depends on execution order |
Validation
Section titled “Validation”The runner validates:
- Exactly one of
agent_file,agent_id, oragent_scriptis specified - Agent files have
.afextension - Agent script paths are valid
Next Steps
Section titled “Next Steps”- Suite YAML Reference - Complete target configuration options
- Datasets - Using agent_args for sample-specific configuration
- Getting Started - Complete tutorial with target examples