Skip to content
  • Auto
  • Light
  • Dark
DiscordForumGitHubSign up
Development Tools
Testing & evals
Graders
View as Markdown
Copy Markdown

Open in Claude
Open in ChatGPT

Tool Graders

Tool graders use Python functions to programmatically evaluate submissions. They’re ideal for deterministic, rule-based evaluation.

Tool graders:

  • Execute Python functions that take (sample, submission) and return a GradeResult
  • Are fast and deterministic
  • Don’t require external API calls
  • Can implement any custom logic
graders:
my_metric:
kind: tool
function: exact_match # Function name
extractor: last_assistant # What to extract from trajectory

Checks if submission exactly matches ground truth (case-sensitive, whitespace-trimmed).

graders:
accuracy:
kind: tool
function: exact_match
extractor: last_assistant

Requires: ground_truth in dataset | Score: 1.0 if exact match, 0.0 otherwise

Checks if submission contains ground truth (case-insensitive).

graders:
contains_answer:
kind: tool
function: contains
extractor: last_assistant

Requires: ground_truth in dataset | Score: 1.0 if found, 0.0 otherwise

Checks if submission matches a regex pattern in ground truth.

graders:
pattern:
kind: tool
function: regex_match
extractor: last_assistant

Score: 1.0 if pattern matches, 0.0 otherwise

Validates that all characters are printable ASCII.

graders:
ascii_check:
kind: tool
function: ascii_printable_only
extractor: last_assistant

Score: 1.0 if all characters are printable ASCII, 0.0 otherwise