Reasoning
ARC-AGI-2
Updated ARC challenge — novel and tough abstract reasoning.
8 models published a score
| # | Model | Company | Score |
|---|---|---|---|
| 1 | GPT-5.5 | OpenAI | 85.0 |
| 2 | Gemini 3 Deep Think | Google DeepMind | 84.6 |
| 3 | Gemini 3.1 Pro | Google DeepMind | 77.1 |
| 4 | Claude Opus 4.6 | Anthropic | 68.8 |
| 5 | Claude Sonnet 4.6 | Anthropic | 60.4 |
| 6 | GPT-5.2 | OpenAI | 52.9 |
| 7 | Claude Opus 4.5 | Anthropic | 37.6 |
| 8 | Gemini 3 Pro | Google DeepMind | 31.1 |