Golden trajectory management, comparison, and curation for agent-eval-harness
Install a rigorous three-layer mobile UX evaluation skill for Claude — Nielsen's 10, WCAG 2.1 AA, and mobile platform accessibility (iOS HIG + Material Design).
One command. Find out if your AI agrees with itself. Statistically validated consensus measurement using multi-model AI raters.
Evaluate an expression and get what you expect.
DynamoDB storage adapter for Mastra
Alys local CLI runtime for autonomous AI data preparation.
Eval loader for WebPack 2+
node --eval="require('@mafeinar/parchemod').init()" node --eval="require('@mafeinar/parchemod').clean()"
ESLint environments for MongoDB
OpenTelemetry observability (tracing, metrics, logging, dashboards) for agent-eval-harness
Evaluate EcmaScript expression safely (without `eval()` call)
Reproducible adversarial eval for AI agents. Deterministic scoring, no model judge. Vibefixing publishes the eval; we don't compete on the leaderboard.
A really simple message queue based on Redis
Internal, evaluation logic for Formium
Headlessly browse websites with a deny-all report-only CSP, capture violations, and generate production-ready CSP policies
Safely fetch JavaScript source code from a remote GitHub repository and save it locally.
Deterministic test-enforcement harness — escape-scan, hash-pinning, CRAP, architecture checks, bias detection, Gherkin lint. Companion to the audit-tests and implement-tests Claude Code skills.
Helpers for bash like shell scripting in JavaScript
Local-first tests for AGENTS.md and coding-agent instructions.
WebAssembly module for evaluating CEL (Common Expression Language) expressions in Node.js and browsers
AI Literacy Framework — Sanity Studio dashboard plugin
Execute JavaScript and create addons in Koishi
An interface to select and display a video from YouTube, Vimeo or a local file from Directus
Calibration-first LLM evaluation library for Node/TypeScript