Skip to content

Development Tools

Testing & evals

Letta Evals

Systematic testing for stateful AI agents. Validate changes, prevent regressions, and ship with confidence.

Test agent memory, tool usage, multi-turn conversations, and state evolution with automated grading and pass/fail gates.

Core Concepts

Understand the building blocks of evaluations:

Suites - Configure your evaluation
Datasets - Define test cases
Targets - Specify the agent to test
Graders - Score agent outputs
Extractors - Extract content from responses
Gates - Set pass/fail criteria

Grading & Extraction

Choose how to score your agents:

Tool Graders - Fast, deterministic grading with Python functions
Rubric Graders - Flexible LLM-as-judge evaluation
Built-in Extractors - Pre-built content extractors
Multi-Metric Grading - Evaluate multiple dimensions

Advanced

Custom Graders - Write your own grading logic
Custom Extractors - Build custom extractors
Multi-Turn Conversations - Test memory and state
Suite YAML Reference - Complete configuration schema

Reference

CLI Commands - Command-line interface
Understanding Results - Interpret metrics
Troubleshooting - Common issues and solutions

Resources

GitHub Repository - Source code, issues, and contributions
PyPI Package - Install with pip install letta-evals