Arize evals package
Axiom AI SDK provides - an API to wrap your AI calls with observability instrumentation. - offline evals - online evals
Harness AI Evals Service APIs integrated with react hooks
SDK for Hamming Evals Framework
AI SDK harness adapter for vitest-evals.
Runflow Evals — project-local evals framework for Runflow agents (datasets, scorers, journey/conversation validation, LLM judge, viewer)
OpenAI Agents SDK harness adapter for vitest-evals.
Much like tests in traditional software, evals are an important part of bringing LLM applications to production. The goal of this package is to help provide a starting point for you to write evals for your LLM applications, from which you can write more c
pi-ai harness adapter with tool replay for vitest-evals.
GitHub Actions reporting internals for vitest-evals runs.
MCP server unit testing, end to end (e2e) testing, and server evals
GitHub Action for evaluating MCP server tool calls using LLM-based scoring
Model graded evals with typescript
Replacement for _.template (underscore or lodash) without unsafe evals.
No description provided.
Golden-prompt regression guard for the Maestro agent runtime. Static evals (mock-based, every CI) plus live evals (real Anthropic, scheduled) that catch the four Anthropic tool-calling traps before they ship.
Viteval UI - local UI for viewing the results of your evals
Inference.net CLI - manage training runs, evals, datasets, and inferences from your terminal
promptfoo custom provider for running evals against a Lobu agent
LLM inference provider evals
Guards, Evals & Observability for AI applications - works seamlessly with LangChain/LangGraph
Metrics for Open Evals
Harness-backed AI testing on top of Vitest.
Open source AI evaluation framework — LLM-as-judge + assertion-based evals for any AI app. CLI + MCP server.
Typed agent eval runtime, artifacts, and cargo-evals integration
A string calculator to compute formulas inside strings.
Deterministic authorization for AI agent tool calls
A powerful arithmetic and boolean expression evaluator
High-performance JSON Logic evaluator with schema validation and dependency tracking. Built on blazing-fast Rust engine.
CLI for parsing, validating, linting and evaluating Sigma detection rules
A `cargo` subcommand designed to let people quickly and easily run Rust “scripts” which can make use of `cargo`'s package ecosystem.
Cargo subcommand for listing and running typed agent eval suites
Evaluation framework for swink-agent: trajectory tracing, golden path verification, and cost governance
Expression evaluator
This is a tool to evaluate or export code from Markdown files
Expression evaluator
A library for LLM evals
library to save away eval'ed code to a file first, so that it can be seen later [ex: while debugging]
LLM evaluation engine for Rails.
Provides the DSPy::Evals runtime, concurrency, callbacks, and export helpers for benchmarking Ruby DSPy programs.
A secure, non-evaling end user template engine with aesthetic markup.
Interactive Ruby command-line tool for REPL (Read Eval Print Loop).
This ripl plugin allows you to evaluate multiple lines of Ruby code.
Eval
Continuous testing for Ruby with fork/eval
Use rspec's let() outside of rspec with concerns and functional operators
tests strings of Ruby code for unauthorized patterns (exit, eval, ...)
Alternate eval which hook all methods executed in the evaluated code
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.