Lighthouse for AI coding harnesses. Benchmark your Claude Code setup and get a score out of 100.
Backtest, score and risk-guard Bitget trading agents on real candle data. Emits a reproducible scorecard and trade ledger you can hand a judge.
Benchmark evaluation for Miyabi - SWE-bench Pro, AgentBench, HAL, Galileo
Benchmark comparison: runs agents against GAIA, AgentBench, and WebArena tasks and reports percentile vs. published baselines (v2 F-05)