Skip to content
  • Auto
  • Light
  • Dark
DiscordForumGitHubSign up
Development Tools
Testing & evals
CLI reference
View as Markdown
Copy Markdown

Open in Claude
Open in ChatGPT

CLI Commands

The letta-evals command-line interface lets you run evaluations, validate configurations, and inspect available components.

Typical workflow:

  1. Validate your suite: letta-evals validate suite.yaml
  2. Run evaluation: letta-evals run suite.yaml --output results/
  3. Check exit code: echo $? (0 = passed, 1 = failed)

Run an evaluation suite.

Terminal window
letta-evals run <suite.yaml> [options]
  • suite.yaml: Path to the suite configuration file (required)

Save results to a directory.

Terminal window
letta-evals run suite.yaml --output results/

Creates:

  • results/header.json: Evaluation metadata
  • results/summary.json: Aggregate metrics and configuration
  • results/results.jsonl: Per-sample results (one JSON per line)

Quiet mode - only show pass/fail result.

Terminal window
letta-evals run suite.yaml --quiet

Output:

✓ PASSED

Maximum concurrent sample evaluations. Default: 15

Terminal window
letta-evals run suite.yaml --max-concurrent 10

Higher values = faster evaluation but more resource usage.

Letta API key (overrides LETTA_API_KEY environment variable).

Terminal window
letta-evals run suite.yaml --api-key your-key

Letta server base URL (overrides suite config and environment variable).

Terminal window
letta-evals run suite.yaml --base-url https://api.letta.com

Letta project ID for cloud deployments.

Terminal window
letta-evals run suite.yaml --project-id proj_abc123

Path to cached results (JSONL) for re-grading trajectories without re-running the agent.

Terminal window
letta-evals run suite.yaml --cached previous_results.jsonl

Use this to test different graders on the same agent trajectories.

Run the evaluation multiple times to measure consistency. Default: 1

Terminal window
letta-evals run suite.yaml --num-runs 10

Output with multiple runs:

  • Each run creates a separate run_N/ directory with individual results
  • An aggregate_stats.json file contains statistics across all runs (mean, standard deviation, pass rate)

Basic run:

Terminal window
letta-evals run suite.yaml # Run evaluation, show results in terminal

Save results:

Terminal window
letta-evals run suite.yaml --output evaluation-results/ # Save to directory

Letta Cloud:

Terminal window
letta-evals run suite.yaml \
--base-url https://api.letta.com \
--api-key $LETTA_API_KEY \
--project-id proj_abc123

Quiet CI mode:

Terminal window
letta-evals run suite.yaml --quiet
if [ $? -eq 0 ]; then
echo "Evaluation passed"
else
echo "Evaluation failed"
exit 1
fi
  • 0: Evaluation passed (gate criteria met)
  • 1: Evaluation failed (gate criteria not met or error)

Validate a suite configuration without running it.

Terminal window
letta-evals validate <suite.yaml>

Checks:

  • YAML syntax is valid
  • Required fields are present
  • Paths exist
  • Configuration is consistent
  • Grader/extractor combinations are valid

Output on success:

✓ Suite configuration is valid

Output on error:

✗ Validation failed:
- Agent file not found: agent.af
- Grader 'my_metric' references unknown function

List all available extractors.

Terminal window
letta-evals list-extractors

Output:

Available extractors:
last_assistant - Extract the last assistant message
first_assistant - Extract the first assistant message
all_assistant - Concatenate all assistant messages
pattern - Extract content matching regex
tool_arguments - Extract tool call arguments
tool_output - Extract tool return value
after_marker - Extract content after a marker
memory_block - Extract from memory block (requires agent_state)

List all available grader functions.

Terminal window
letta-evals list-graders

Output:

Available graders:
exact_match - Exact string match with ground_truth
contains - Check if contains ground_truth
regex_match - Match regex pattern
ascii_printable_only - Validate ASCII-only content

Show help information.

Terminal window
letta-evals --help

Show help for a specific command:

Terminal window
letta-evals run --help
letta-evals validate --help

API key for Letta authentication.

Terminal window
export LETTA_API_KEY=your-key-here

Letta server base URL.

Terminal window
export LETTA_BASE_URL=https://api.letta.com

Letta project ID (for cloud).

Terminal window
export LETTA_PROJECT_ID=proj_abc123

OpenAI API key (for rubric graders).

Terminal window
export OPENAI_API_KEY=your-openai-key

Configuration values are resolved in this order (highest to lowest priority):

  1. CLI arguments (--api-key, --base-url, --project-id)
  2. Suite YAML configuration
  3. Environment variables
name: Run Evals
on: [push]
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Install dependencies
run: pip install letta-evals
- name: Run evaluation
env:
LETTA_API_KEY: ${{ secrets.LETTA_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
letta-evals run suite.yaml --quiet --output results/
- name: Upload results
uses: actions/upload-artifact@v2
with:
name: eval-results
path: results/
evaluate:
script:
- pip install letta-evals
- letta-evals run suite.yaml --quiet --output results/
artifacts:
paths:
- results/
variables:
LETTA_API_KEY: $LETTA_API_KEY
OPENAI_API_KEY: $OPENAI_API_KEY