CLI Commands
The letta-evals command-line interface lets you run evaluations, validate configurations, and inspect available components.
Typical workflow:
- Validate your suite:
letta-evals validate suite.yaml - Run evaluation:
letta-evals run suite.yaml --output results/ - Check exit code:
echo $?(0 = passed, 1 = failed)
Run an evaluation suite.
letta-evals run <suite.yaml> [options]Arguments
Section titled “Arguments”suite.yaml: Path to the suite configuration file (required)
Options
Section titled “Options”—output, -o
Section titled “—output, -o”Save results to a directory.
letta-evals run suite.yaml --output results/Creates:
results/header.json: Evaluation metadataresults/summary.json: Aggregate metrics and configurationresults/results.jsonl: Per-sample results (one JSON per line)
—quiet, -q
Section titled “—quiet, -q”Quiet mode - only show pass/fail result.
letta-evals run suite.yaml --quietOutput:
✓ PASSED—max-concurrent
Section titled “—max-concurrent”Maximum concurrent sample evaluations. Default: 15
letta-evals run suite.yaml --max-concurrent 10Higher values = faster evaluation but more resource usage.
—api-key
Section titled “—api-key”Letta API key (overrides LETTA_API_KEY environment variable).
letta-evals run suite.yaml --api-key your-key—base-url
Section titled “—base-url”Letta server base URL (overrides suite config and environment variable).
letta-evals run suite.yaml --base-url https://api.letta.com—project-id
Section titled “—project-id”Letta project ID for cloud deployments.
letta-evals run suite.yaml --project-id proj_abc123—cached, -c
Section titled “—cached, -c”Path to cached results (JSONL) for re-grading trajectories without re-running the agent.
letta-evals run suite.yaml --cached previous_results.jsonlUse this to test different graders on the same agent trajectories.
—num-runs
Section titled “—num-runs”Run the evaluation multiple times to measure consistency. Default: 1
letta-evals run suite.yaml --num-runs 10Output with multiple runs:
- Each run creates a separate
run_N/directory with individual results - An
aggregate_stats.jsonfile contains statistics across all runs (mean, standard deviation, pass rate)
Examples
Section titled “Examples”Basic run:
letta-evals run suite.yaml # Run evaluation, show results in terminalSave results:
letta-evals run suite.yaml --output evaluation-results/ # Save to directoryLetta Cloud:
letta-evals run suite.yaml \ --base-url https://api.letta.com \ --api-key $LETTA_API_KEY \ --project-id proj_abc123Quiet CI mode:
letta-evals run suite.yaml --quietif [ $? -eq 0 ]; then echo "Evaluation passed"else echo "Evaluation failed" exit 1fiExit Codes
Section titled “Exit Codes”0: Evaluation passed (gate criteria met)1: Evaluation failed (gate criteria not met or error)
validate
Section titled “validate”Validate a suite configuration without running it.
letta-evals validate <suite.yaml>Checks:
- YAML syntax is valid
- Required fields are present
- Paths exist
- Configuration is consistent
- Grader/extractor combinations are valid
Output on success:
✓ Suite configuration is validOutput on error:
✗ Validation failed: - Agent file not found: agent.af - Grader 'my_metric' references unknown functionlist-extractors
Section titled “list-extractors”List all available extractors.
letta-evals list-extractorsOutput:
Available extractors: last_assistant - Extract the last assistant message first_assistant - Extract the first assistant message all_assistant - Concatenate all assistant messages pattern - Extract content matching regex tool_arguments - Extract tool call arguments tool_output - Extract tool return value after_marker - Extract content after a marker memory_block - Extract from memory block (requires agent_state)list-graders
Section titled “list-graders”List all available grader functions.
letta-evals list-gradersOutput:
Available graders: exact_match - Exact string match with ground_truth contains - Check if contains ground_truth regex_match - Match regex pattern ascii_printable_only - Validate ASCII-only contentShow help information.
letta-evals --helpShow help for a specific command:
letta-evals run --helpletta-evals validate --helpEnvironment Variables
Section titled “Environment Variables”LETTA_API_KEY
Section titled “LETTA_API_KEY”API key for Letta authentication.
export LETTA_API_KEY=your-key-hereLETTA_BASE_URL
Section titled “LETTA_BASE_URL”Letta server base URL.
export LETTA_BASE_URL=https://api.letta.comLETTA_PROJECT_ID
Section titled “LETTA_PROJECT_ID”Letta project ID (for cloud).
export LETTA_PROJECT_ID=proj_abc123OPENAI_API_KEY
Section titled “OPENAI_API_KEY”OpenAI API key (for rubric graders).
export OPENAI_API_KEY=your-openai-keyConfiguration Priority
Section titled “Configuration Priority”Configuration values are resolved in this order (highest to lowest priority):
- CLI arguments (
--api-key,--base-url,--project-id) - Suite YAML configuration
- Environment variables
Using in CI/CD
Section titled “Using in CI/CD”GitHub Actions
Section titled “GitHub Actions”name: Run Evalson: [push]
jobs: evaluate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2
- name: Install dependencies run: pip install letta-evals
- name: Run evaluation env: LETTA_API_KEY: ${{ secrets.LETTA_API_KEY }} OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | letta-evals run suite.yaml --quiet --output results/
- name: Upload results uses: actions/upload-artifact@v2 with: name: eval-results path: results/GitLab CI
Section titled “GitLab CI”evaluate: script: - pip install letta-evals - letta-evals run suite.yaml --quiet --output results/ artifacts: paths: - results/ variables: LETTA_API_KEY: $LETTA_API_KEY OPENAI_API_KEY: $OPENAI_API_KEYDebugging
Section titled “Debugging”Common Issues
Section titled “Common Issues”Next Steps
Section titled “Next Steps”- Understanding Results - Interpreting evaluation output
- Suite YAML Reference - Complete configuration options
- Getting Started - Complete tutorial with examples