Skip to content
FB
Frontier Benchmarks AI
Models
Wizard
Battle
Hardware
Pricing
Methodology
Download
Search
/
EN
ES
Home
Models
Wizard
Battle
Hardware
Pricing
Methodology
Download
home
/
benchmarks
/
MMLU-Pro
Reasoning
MMLU-Pro
MMLU upgraded with harder questions and 10 answer options.
32 models published a score
#
Model
Company
Score
1
Claude Opus 4.7
Anthropic
91.5
2
Claude Opus 4.6
Anthropic
90.5
3
Claude Opus 4.5
Anthropic
90.0
4
Gemini 3 Pro
Google DeepMind
89.8
5
Doubao Seed 2.0 Lite
ByteDance
87.7
6
DeepSeek V4 Pro
DeepSeek
87.5
7
Kimi K2.5
Moonshot AI
87.1
8
Grok 4
xAI
87.0
9
Doubao Seed 2.0 Pro
ByteDance
87.0
10
Qwen3-Max-Thinking
Alibaba
85.7
11
Gemma 4 (31B dense)
Google DeepMind
85.2
12
Qwen3.6-35B-A3B
Alibaba
85.2
13
DeepSeek V3.2
DeepSeek
85.0
14
K-EXAONE 236B-A23B
LG AI Research
83.8
15
Nemotron 3 Super
Nvidia
83.3
16
EXAONE 4.5 33B
LG AI Research
83.3
17
Gemma 4 26B-A4B
Google DeepMind
82.6
18
Llama 4 Behemoth
Meta
82.2
19
Nova 2 Lite
Amazon
80.9
20
Llama 4 Maverick
Meta
80.5
21
Nemotron 3 Nano
Nvidia
78.3
22
Mistral Small 4
Mistral AI
78.0
23
Mistral Large 3
Mistral AI
78.0
24
GPT-5.2
OpenAI
75.4
25
Llama 4 Scout
Meta
74.3
26
MiniMax M2.5
MiniMax
74.0
27
Nova Premier
Amazon
73.3
28
Gemma 4 E4B
Google DeepMind
69.4
29
Reka Flash 3.1
Reka
66.9
30
Reka Flash 3
Reka
65.0
31
Gemma 4 E2B
Google DeepMind
60.0
32
Jamba 1.7 Large
AI21 Labs
57.7
← All benchmarks
How we measure