LLM evaluation framework with batch processing and data sources
Infer the probabilistic schema for a MongoDB collection.
TENET — The operating system for AI agent teams
AI agent evaluation CLI for races, replay, scorecards, and CI regression gates
Perl 5 interpreter powered by WebAssembly for JavaScript runtimes
Errors
Determines whether a Node file is a Module (`import`) or a Script (`require`)
Multi-format rendering of synthetic evaluation data — validate fixtures before they enter the eval pipeline.
Runflow Evals — project-local evals framework for Runflow agents (datasets, scorers, journey/conversation validation, LLM judge, viewer)
Shared infrastructure for pi-mind packages: pi spawn helper, .pi-mind path resolution.
Mathematical expression evaluator
RAG evaluation metric scorers: faithfulness, relevance, context precision/recall
minimal core for ssb clients
Atbench (agent-trajectory safety benchmark) evaluation harness for nexus-agents
Dataset loading, validation, generation, and versioning for RAG evals
Specification-driven development CLI for Claude Code — think before you build
Canonical contracts kernel for the Intent Eval Platform — TypeScript types, JSON Schemas, Zod validators, and state machines for the 13 canonical entities.
Node.js vm module for Gjs
Bash + markdown-skills framework for structured multi-agent development with Claude Code
Official CLI for Autousers — UX evaluation, autousers, and calibration from your terminal.
Statsig helps you move faster with feature gates (feature flags), and/or dynamic configs. It also allows you to run A/B/n tests to validate your new features and understand their impact on your KPIs. If you're new to Statsig, check out our product and cre
> This package is deprecated. Use @tscircuit/eval instead with the `/webworker` > import
45 specialized judges that evaluate AI-generated code for security, cost, and quality.
Mathematical expression solving library