BETAmodules.com is in beta — open to partnerships & joint ventures.Build with us

cross-ecosystem search · live

Results for judge-cli

Found in 3 of 7 ecosystemsnpm 1–24 of 276,503 · 5 matches across other registries

How we search: free-text on npm, crates.io, RubyGems, NuGet and Maven. PyPI and Go do exact-name lookup only. Tip: click an ecosystem chip below to filter; click Show all ecosystems to come back.

Sort

Auto-load on scroll

npm matches

Showing 24 of 276,503 · JavaScript

See all npm →

judge-cliv0.1.6

npm

AI-Powered Code Quality Assistant utilizing parallel specialized expert judges.

Maintained. Maintained, actively maintained.

psojv1.3.1

npm

problem solving offline judge CLI

Abandoned. Last published 4 years ago.

ya-oj-cliv0.2.3

npm

yet another online judge cli

Abandoned. Last published 3 years ago.

@reaatech/llm-judge-cliv0.1.0

npm

CLI for LLM Judge Toolkit — evaluate and calibrate commands

Maintained. Maintained, actively maintained.

openevalsv0.2.0

npm

Much like tests in traditional software, evals are an important part of bringing LLM applications to production. The goal of this package is to help provide a starting point for you to write evals for your LLM applications, from which you can write more c

Worth a look. Actively maintained and growing, actively maintained.

werewolf-judge-cdnv0.0.0-gf0582ee8

npm

Runtime asset loader and manifest for the WerewolfJudge web app — fonts, audio sprites, and image bundles

Maintained. Maintained, actively maintained.

eval-benchv0.21.1

npm

Benchmark Claude Code plugins by A/B comparing plugin versions with LLM-judged evaluation prompts.

Maintained. Maintained, actively maintained.

vitest-evalsv0.11.0

npm

Harness-backed AI testing on top of Vitest.

Maintained. Maintained, actively maintained.

@wifo/factory-harnessv0.0.14

npm

Scenario runner for software-factory specs — runs `test:` satisfaction lines and scores `judge:` lines via LLM

Maintained. Maintained, actively maintained.

@lythos/test-utilsv0.16.0

npm

![Coverage](https://img.shields.io/badge/coverage-94%25-brightgreen) ![CI](https://img.shields.io/badge/CI-78%20unit%20tests-brightgreen) ![Intent/Plan](https://img.shields.io/badge/arch-intent%2Fplan%2Fexecute-8A2BE2) ![LLM Audit](https://img.shields.io/

Maintained. Maintained, actively maintained.

@blazediff/agentv0.6.0

npm

Agentic visual regression for BlazeDiff. Auto-discovers routes, captures deterministic screenshots, runs CI checks.

Maintained. Maintained, actively maintained.

hydroojv5.0.1

npm

No description provided.

Maintained. Maintained, actively maintained.

@fede0089/skill-evalv3.0.2

npm

CLI to evaluate agent skills triggering and functionality

Maintained. Maintained, actively maintained.

@remnic/plugin-openclawv9.3.595

npm

OpenClaw adapter for Remnic memory with bundled @remnic/core runtime

Maintained. Niche but maintained, actively maintained.

@wifo/factory-spec-reviewv0.0.14

npm

LLM-judged spec quality reviewer — runs subscription-paid `claude -p` judges against software-factory specs and emits findings in factory-spec-lint's output format

Maintained. Maintained, actively maintained.

mongodb-assistant-evalv0.0.8

npm

Evaluation library for the MongoDB Assistant API.

Maintained. Maintained, actively maintained.

@forwardimpact/libevalv0.1.54

npm

Agent evaluation framework — prove whether agent changes improved outcomes with reproducible evidence.

Maintained. Maintained, actively maintained.

@lythos/skill-arenav0.16.0

npm

Skill Arena — benchmark skill effectiveness with controlled-variable comparison

Maintained. Maintained, actively maintained.

@tarpit/judgev2.1.1

npm

Runtime type validation and assertion utilities

Maintained. Maintained, actively maintained.

@riddledc/riddle-proofv0.8.31

npm

Reusable Riddle Proof contracts and helpers for evidence-backed agent changes.

Maintained. Maintained, actively maintained.

logrotator for egg

Abandoned. Last published over a year ago.

propline-cliv0.7.0

npm

Command-line interface for the PropLine player props betting odds API. Wraps the propline SDK with pretty-printed tables and JSON output.

Maintained. Maintained, actively maintained.

@reactive-agents/evalv0.11.1

npm

Evaluation framework for Reactive Agents — LLM-as-judge scoring, regression detection, dataset loading

Maintained. Maintained, actively maintained.

mahout-benchv1.0.1

npm

CLI benchmark for measuring and mitigating sycophancy in LLMs. Supports multi-provider execution, configurable judges, and long-running evaluation campaigns.

Maintained. Maintained, actively maintained.