Evaluation CLI for AI Observability on Dynatrace
Corelay Mesh eval pipeline — test suites, LLM-judged scoring, deploy-gate thresholds.
WebGL module for Gjs
Shared domain types and Zod schemas for agent-eval-harness
Pragmatic eval framework for LLM features. Runs eval files as Bun tests with scorers, baselines, and reporting.
Universal eval-guard for AI coding agents — thin alias for @holdpoint/cli
Compile eval calls with string literals
  
Alias for eval global.
A Redis-backed leaky-bucket rate limiter
Statsig helps you move faster with feature gates (feature flags), and/or dynamic configs. It also allows you to run A/B/n tests to validate your new features and understand their impact on your KPIs. If you're new to Statsig, check out our product and cre
Open-source testing and regression detection framework for AI agents. Golden baseline diffing, CI/CD integration, works with LangGraph, CrewAI, OpenAI, Claude, HuggingFace, Ollama, and MCP.
Cloudflare Workers MCP server wrapper: ai-eval
Infer the probabilistic schema for a MongoDB collection.
JavaScript code execution context for the browser and wrapper around node vm module
REPL environment.
RAG evaluation metric scorers: faithfulness, relevance, context precision/recall
Determines whether a Node file is a Module (`import`) or a Script (`require`)
TENET — The operating system for AI agent teams
LLM evaluation framework with batch processing and data sources
AI agent evaluation CLI for races, replay, scorecards, and CI regression gates
Dataset loading, validation, generation, and versioning for RAG evals
Perl 5 interpreter powered by WebAssembly for JavaScript runtimes
Errors