BETAmodules.com is in beta — open to partnerships & joint ventures.Build with us

cross-ecosystem search · live

Results for agent-eval

Found in 4 of 7 ecosystemsnpm 1–24 of 69,617 · 11 matches across other registries

How we search: free-text on npm, crates.io, RubyGems, NuGet and Maven. PyPI and Go do exact-name lookup only. Tip: click an ecosystem chip below to filter; click Show all ecosystems to come back.

Sort

Auto-load on scroll

npm matches

Showing 24 of 69,617 · JavaScript

See all npm →

agent-evalv0.0.1

npm

No description provided.

Aging — last published 9 months ago — check before adopting.

@reaatech/agent-eval-harness-typesv0.1.0

npm

Shared domain types and Zod schemas for agent-eval-harness

Maintained. Maintained, actively maintained.

@reaatech/agent-eval-harness-trajectoryv0.1.0

npm

Trajectory loading, evaluation, and comparison for agent-eval-harness

Maintained. Maintained, actively maintained.

@reaatech/agent-eval-harness-costv0.1.0

npm

Cost tracking, budget management, and reporting for agent-eval-harness

Maintained. Maintained, actively maintained.

@reaatech/agent-eval-harness-latencyv0.1.0

npm

Latency monitoring, SLA enforcement, and optimization analysis for agent-eval-harness

Maintained. Maintained, actively maintained.

@reaatech/agent-eval-harness-goldenv0.1.0

npm

Golden trajectory management, comparison, and curation for agent-eval-harness

Maintained. Maintained, actively maintained.

@reaatech/agent-eval-harness-observabilityv0.1.0

npm

OpenTelemetry observability (tracing, metrics, logging, dashboards) for agent-eval-harness

Maintained. Maintained, actively maintained.

@reaatech/agent-eval-harness-suitev0.1.0

npm

Orchestrated evaluation suite runner with results aggregation for agent-eval-harness

Maintained. Maintained, actively maintained.

@reaatech/agent-eval-harness-tool-usev0.1.0

npm

Tool-use validation (selection, schema compliance, result verification) for agent-eval-harness

Maintained. Maintained, actively maintained.

@reaatech/agent-eval-harness-judgev0.1.0

npm

Provider-agnostic LLM-as-judge with calibration and consensus for agent-eval-harness

Maintained. Maintained, actively maintained.

@tangle-network/agent-evalv0.85.0

npm

Evaluate and improve AI agents from runs, traces, judges, and feedback. Compare candidates, cluster failures, measure lift, and gate releases.

Maintained. Maintained, actively maintained.

@iris-eval/mcp-serverv0.4.2

npm

The agent eval standard for MCP. Score every agent output for quality, safety, and cost.

Maintained. Maintained, actively maintained.

@vercel/agent-eval-playgroundv0.1.3

npm

Web-based playground for browsing agent-eval experiment results

Maintained. Maintained, actively maintained.

@reaatech/agent-eval-harness-gatev0.1.0

npm

CI regression gates, threshold checks, and JUnit/GitHub integration for agent-eval-harness

Maintained. Maintained, actively maintained.

@reaatech/agent-eval-harness-mcp-serverv0.1.0

npm

Three-layer MCP tool server (judge, suite, gate) for agent-eval-harness

Maintained. Maintained, actively maintained.

@reaatech/agent-eval-harness-cliv0.1.0

npm

CLI interface for agent-eval-harness with eval, judge, compare, gate, golden, report, and serve commands

Maintained. Maintained, actively maintained.

@vercel/agent-evalv1.0.0

npm

Framework for testing AI coding agents in isolated sandboxes

Maintained. Maintained, actively maintained.

@wundr.io/agent-evalv1.0.38

npm

Agent evaluation framework with LLM-based grading for AI agent quality assessment

Maintained. Maintained, actively maintained.

@plaited/agent-eval-harnessv1.0.0

npm

General-purpose eval harness for running trials against CLI agents

Maintained. Maintained, actively maintained.

@aumos/agent-evalv0.1.0

npm

TypeScript client for the AumOS agent-eval evaluation framework — benchmarks, metrics, and run comparisons

Maintained. Maintained, actively maintained.

static-evalv2.1.1

npm

evaluate statically-analyzable expressions

Abandoned. Last published 2 years ago.

node-agent-evalv0.1.0

npm

Compare coding agents head-to-head. Pass rate, cost, time, consistency — one command.

Maintained. Maintained, actively maintained.

agent-eval-harnessv0.1.0

npm

Static + schema + routing + spawn-fixture eval harness for *.md subagents (Claude Code, etc.). Catches description bloat, fence-mimicry, low routing margin, and schema regressions before they ship.

Maintained. Maintained, actively maintained.

@ls-stack/agent-evalv0.61.0

npm

No description provided.

Maintained. Maintained, actively maintained.