45 specialized judges that evaluate AI-generated code for security, cost, and quality.
CLI wrapper for the Judges code review toolkit.
Coming soon.
Evaluation SDK for AgentV - build custom code judges
Autonomous AI engineer — observes, judges, builds, ships
LLM-judged spec quality reviewer — runs subscription-paid `claude -p` judges against software-factory specs and emits findings in factory-spec-lint's output format
Ohbem judges your Pokemon GO IVs.
CLI benchmark for measuring and mitigating sycophancy in LLMs. Supports multi-provider execution, configurable judges, and long-running evaluation campaigns.
TypeScript SDK and CLI for evaluating agentskills.io-style AI agent skills with LLM judges, baseline comparison, YAML config, JSONL logs, and HTML reports.
Splitifi Intelligence MCP — outcome predictions, judge profiles, and legal workflows powered by 4,819 ML models trained on 102M+ court records. Serves litigants, attorneys, judges, mediators, CDFAs, and litigation funders.
Suite execution: browser goals, Keys-backed judges, retries, artefacts.
45 specialized judges that evaluate AI-generated code for security, cost, and quality.
Collects metrics and judges the health of a deployment
An MCP server that uses large language models (LLMs) as judges to evaluate the responses of other LLMs.
AI-Powered Code Quality Assistant utilizing parallel specialized expert judges.
Apps Machine — Selection Agent. Ranks app opportunities globally via dual-store (Apple App Store + Google Play) scraping, heuristic scoring, and Claude judges. Run `npx @apps-machine/selection-agent demo` for a 30s magical moment.
Judges suspicious Discord links
How good is your film? Sarah judges all
Pi extension that brings Hermes Agent's /goal (Ralph loop with judge) to Pi as /until-done. Pi self-judges every turn, runs verifyCommand to confirm done, and routes all CI/CD through mise across 18 language profiles.
Harness-backed AI testing on top of Vitest.
MCP server for Claude Code — expose the Versuz marketplace as native tools. Search, inspect, install, and battle 100k+ ranked SKILL.md and CLAUDE.md files inline. Daily benchmark with 3 frontier judges.
A library to extract information easily from various online judges.
agency-os core: MCP server, orchestrator, Minerva KG, observability, quality-gate judges. Imported by the @dvg-os/agency-os Claude Code plugin.
TypeScript SDK and CLI for evaluating agentskills.io-style AI agent skills with LLM judges, baseline comparison, YAML config, JSONL logs, and HTML reports.
Secret access layer for cooperative AI agents — structured, policy-gated, audited credential access
Benchmark harness for zer: throughput, accuracy, and competitor-library comparison
Scope firewall and audit layer for AI coding agents
Cause-and-effect tester; help prototype a system before writing real code.
A judge library for online judge system
LFM2.5-based death-loop judge for MimirsWell agentic information flows
ONNX-based neural judge for zer, runs DeBERTa/MiniLM NLI models via ORT to adjudicate borderline record pairs
A Rust CLI tool for estimating success rates when using LLM judges for evaluation
Evaluation framework for swink-agent: trajectory tracing, golden path verification, and cost governance
Zero-shot entity resolution pipeline for Dutch-centric data, with GPU acceleration and neural judging
Multi-model deliberation CLI — 5 frontier LLMs debate, then Claude judges
Heuristic and LLM-based memory relevance scoring
A command-line tool that runs a collection of \"judges\" against a \"factbase,\" modifying it and updating. Also, helps printing a factbase, merge with another one, inspect, and so on. Also, helps run automated tests for a set of judges.
This is a judge system for online judge. This gem work on the wandbox api.
Easily add Judge client side validation to your SimpleForm forms.
Easily add Judge client side validation to your Formtastic forms.
To judge the object is contained in the collection or not.
The Git Extension for online judges (Codeforces, etc...)
A collection of extensions for a factbase, helping the judges of Zerocracy manipulate the facts and create new ones
Container for C code extension that implements the West Point Bridge Contest Judge.
tcjudge offers a simple command line tool that judges TopCoder solutions within local environment.
OJAgent is a client to submit solutions and query status at different online judges. It provides a uniformed interface to a lot of famous online judges.
Detect and judge the face of a photo.
TD Critic uses rubocop to check your code based on the Ruby Style Guide.
No description provided.
No description provided.