A production-grade CLI for testing and benchmarking LLM applications with support for GPT-5, Claude Opus 4, Gemini 2.5, and 65+ models
Core library for LLM Test Bench - comprehensive testing framework for Large Language Models with 65+ supported models across 14+ providers
Dataset management and utilities for LLM Test Bench - load, validate, and manage test datasets