Agent Evaluation + Regression Harness
v1.0.0 · Last updated 2/17/2026
- Build eval cases (NDJSON)
- Design rubric.json + hard_fails.json
- Run batch grading
- Diff results across versions
- Gate release on pass criteria
# Agent Evaluation + Regression Harness ## Overview Prove the agent still does the job after changes. Eval cases, rubric, batch grading, version diff, release gate. ## Outcomes - Build eval cases (NDJSON) - Design rubric.json + hard_fails.json - Run batch grading - Diff results across versions - Gate release on pass criteria