Skill evaluation framework for Claude agents — static analysis + live agent testing
CLI benchmark system for tracking skill versions, scoring performance, and comparing improvements
Standalone skill test runner for AI agent skills with deterministic fixtures, assertions, and reports.