AI-Powered Code Quality Assistant utilizing parallel specialized expert judges.
problem solving offline judge CLI
yet another online judge cli
CLI for LLM Judge Toolkit — evaluate and calibrate commands
Much like tests in traditional software, evals are an important part of bringing LLM applications to production. The goal of this package is to help provide a starting point for you to write evals for your LLM applications, from which you can write more c
Runtime asset loader and manifest for the WerewolfJudge web app — fonts, audio sprites, and image bundles
Benchmark Claude Code plugins by A/B comparing plugin versions with LLM-judged evaluation prompts.
Harness-backed AI testing on top of Vitest.
Scenario runner for software-factory specs — runs `test:` satisfaction lines and scores `judge:` lines via LLM
   ![LLM Audit](https://img.shields.io/
Agentic visual regression for BlazeDiff. Auto-discovers routes, captures deterministic screenshots, runs CI checks.
No description provided.
CLI to evaluate agent skills triggering and functionality
OpenClaw adapter for Remnic memory with bundled @remnic/core runtime
LLM-judged spec quality reviewer — runs subscription-paid `claude -p` judges against software-factory specs and emits findings in factory-spec-lint's output format
Evaluation library for the MongoDB Assistant API.
Agent evaluation framework — prove whether agent changes improved outcomes with reproducible evidence.
Skill Arena — benchmark skill effectiveness with controlled-variable comparison
Runtime type validation and assertion utilities
Reusable Riddle Proof contracts and helpers for evidence-backed agent changes.
logrotator for egg
Command-line interface for the PropLine player props betting odds API. Wraps the propline SDK with pretty-printed tables and JSON output.
Evaluation framework for Reactive Agents — LLM-as-judge scoring, regression detection, dataset loading
CLI benchmark for measuring and mitigating sycophancy in LLMs. Supports multi-provider execution, configurable judges, and long-running evaluation campaigns.
No description provided.
No description provided.
No description provided.
No description provided.