Runs domain agents and automates improving them from their own traces — chat turns and loop topologies, with an analyst→prompt/knowledge→eval-gated-ship self-improvement loop.
Stringify is to `eval` as `JSON.stringify` is to `JSON.parse`
CLI for Vally — the evaluation platform for AI agents
Smaller than base64, only use ASCII, can run in web browser.
Statsig helps you move faster with feature gates (feature flags), and/or dynamic configs. It also allows you to run A/B/n tests to validate your new features and understand their impact on your KPIs. If you're new to Statsig, check out our product and cre
API called by @shexjs/validator to get a neighborhood (arcs in and out of a node)
Agent evaluation framework with LLM-based grading for AI agent quality assessment
Remix Icon is a set of open source neutral style system symbols elaborately crafted for designers and developers. All of the icons are free to use for both personal and commercial.
asciidoctor report support for querying extensions
General-purpose eval harness for running trials against CLI agents
Evaluation of DMN 1.1 decision tables, limited to S-FEEL (Simple Friendly Enough Expression Language)
JSON Eval RS core JavaScript wrapper (internal package - not published)
A minimal polyfill of setImmediate, for modern browsers using `window.postMessage`, and `MessageChannel` in workers.
Statsig helps you move faster with feature gates (feature flags), and/or dynamic configs. It also allows you to run A/B/n tests to validate your new features and understand their impact on your KPIs. If you're new to Statsig, check out our product and cre
Evaluation SDK for AgentV - build custom code judges
A mock version of Redis EVAL to test Lua scripts
A CLI to evaluate MCP servers performance
``` npm add jsr:@patchwork/annotations ```
Retrieval latency ladder benchmarks + CI regression gates for @remnic/core
Usage, cost, and response telemetry primitives for Agent Assistant
Interactive evaluation pipeline for AI-generated data visualization pages
AI provider ranking: benchmark ingestion, scoring, and model comparison
A very simple eval-in-context function.
Vitest-based evaluation framework for agents, models, and more.