Grade LLM outputs against checks files using an LLM judge
Eval pipeline orchestrator for Claude Code
Multi-turn Claude session driver