Battle Mode
Head-to-head comparison between 2-4 models. Side-by-side benchmark by benchmark, winner marked with ★, top vs bottom spread. Shareable URL — paste a link and the recipient sees exactly the same battle.
Loading...
How it works
- Higher is better on every benchmark (percentages and ELO).
- Comparable: only benchmarks where 2+ models have a published score count toward wins/losses.
- Exact tie: if two models share the same score, both count as win + tie.
- N/A: when a model has no score for a benchmark it is marked as abstained — it does not affect the win rate.
- Spread: difference between max and min for the benchmark, hints whether the gap is significant or marginal.
The URL holds the model slugs (?models=a,b,c). No server — sharing the link loads the data from the catalog at that point in time.