Shared infrastructure for pi-mind packages: pi spawn helper, .pi-mind path resolution.
Runflow Evals — project-local evals framework for Runflow agents (datasets, scorers, journey/conversation validation, LLM judge, viewer)
Atbench (agent-trajectory safety benchmark) evaluation harness for nexus-agents
Specification-driven development CLI for Claude Code — think before you build
Execute a string of JavaScript using Node.js and return the global variable values and functions.
Node.js vm module for Gjs
Bash + markdown-skills framework for structured multi-agent development with Claude Code
45 specialized judges that evaluate AI-generated code for security, cost, and quality.
Mathematical expression solving library
Mathematical expression evaluator
> Semantically a dialect of ClojureScript. Built with Rust. Compiles to JavaScript ES Modules.
Statsig helps you move faster with feature gates (feature flags), and/or dynamic configs. It also allows you to run A/B/n tests to validate your new features and understand their impact on your KPIs. If you're new to Statsig, check out our product and cre
TypeScript SDK for content evaluation
Canonical contracts kernel for the Intent Eval Platform — TypeScript types, JSON Schemas, Zod validators, and state machines for the 13 canonical entities.
A verification toolchain for TypeScript — generates Lean 4 or Dafny from annotated TS
Cost tracking, pricing, budgeting, and reporting for RAG evaluations
Structured logging, OpenTelemetry tracing, and metrics for RAG evaluations
The command-line interface for Gadget
LLM-as-judge with calibration, consensus voting, and cost tracking
A virtual console for capturing and manipulating terminal output.
A Webpack plugin to transpile async module output using Babel. Allows transpiling top level await to ES5.
Fast, compiled, eval-free data validator/transformer
Public npm package for the Eval Studio CLI
Much like tests in traditional software, evals are an important part of bringing LLM applications to production. The goal of this package is to help provide a starting point for you to write evals for your LLM applications, from which you can write more c