Dataset management and utilities for LLM Test Bench - load, validate, and manage test datasets
Core library for LLM Test Bench - comprehensive testing framework for Large Language Models with 65+ supported models across 14+ providers
A production-grade CLI for testing and benchmarking LLM applications with support for GPT-5, Claude Opus 4, Gemini 2.5, and 65+ models
N04 — LLM Benchmark Suite: pluggable benchmarks with built-in math, logic, factual Q&A
CLI tool for parsing, validating, and orchestrating AGM (Agent Graph Memory) files
Core library for parsing, validating, loading, and rendering AGM (Agent Graph Memory) files
Mock OpenAI backend for benchmarking crabtalk
SWE-bench (lite + full) adapter for open-harness
Benchmark suite for OxiLLaMa inference engine
Benchmark harness for evaluating Zeph agent performance on standardized datasets
Mock OpenAI backend for benchmarking crabllm
The autonomous, self-improving AI agent. Single Rust binary. Every channel. Install with: cargo install opencrabs
No description provided.
No description provided.