Benchmark Claude Code plugins by A/B comparing plugin versions with LLM-judged evaluation prompts.
A benchmark tool for measuring performance in xterm.js
Evaluation framework for Axl agentic workflows
Self-hosted enterprise agent harness for typed tools, agents, workflows, state, sandboxing, and telemetry.
CLI to evaluate agent skills triggering and functionality
Make your own error types!
EvalForge Evaluator
Oxc Parser Node API
Utility to dynamically load ESM modules in TypeScript CommonJS projects
Dead simple script string to function execution with arguments, context and error catching.
GYP file format parser in JS
Eval a JS string as if it was a file being required
The agent eval standard for MCP. Score every agent output for quality, safety, and cost.
The DevExpress analytics-core-cli package is a toolkit designed to help developers handle the devexpress-reporting and devexpress-dashboards packages. The toolset will be supplemented as needed.
Read file and eval it
Gets the job done when JSON.stringify can't
Official CLI for the hosted Taplid audit API.
JSON in JavaScript
Registry and inspector for MCP servers — search, inspect, test from CLI or browser
A mikel plugin for evaluating JavaScript expressions.
Agent evaluation and benchmarking for AgentsKit.
Statsig helps you move faster with feature gates (feature flags), and/or dynamic configs. It also allows you to run A/B/n tests to validate your new features and understand their impact on your KPIs. If you're new to Statsig, check out our product and cre
ModelContextProtocol server for Figma
Compile and bundle your MDX files and their dependencies. FAST.