Skip to content
Reasoning

MMLU-Pro

MMLU upgraded with harder questions and 10 answer options.

32 models published a score
# Model Company Score
1 Claude Opus 4.7 Anthropic 91.5
2 Claude Opus 4.6 Anthropic 90.5
3 Claude Opus 4.5 Anthropic 90.0
4 Gemini 3 Pro Google DeepMind 89.8
5 Doubao Seed 2.0 Lite ByteDance 87.7
6 DeepSeek V4 Pro DeepSeek 87.5
7 Kimi K2.5 Moonshot AI 87.1
8 Grok 4 xAI 87.0
9 Doubao Seed 2.0 Pro ByteDance 87.0
10 Qwen3-Max-Thinking Alibaba 85.7
11 Gemma 4 (31B dense) Google DeepMind 85.2
12 Qwen3.6-35B-A3B Alibaba 85.2
13 DeepSeek V3.2 DeepSeek 85.0
14 K-EXAONE 236B-A23B LG AI Research 83.8
15 Nemotron 3 Super Nvidia 83.3
16 EXAONE 4.5 33B LG AI Research 83.3
17 Gemma 4 26B-A4B Google DeepMind 82.6
18 Llama 4 Behemoth Meta 82.2
19 Nova 2 Lite Amazon 80.9
20 Llama 4 Maverick Meta 80.5
21 Nemotron 3 Nano Nvidia 78.3
22 Mistral Small 4 Mistral AI 78.0
23 Mistral Large 3 Mistral AI 78.0
24 GPT-5.2 OpenAI 75.4
25 Llama 4 Scout Meta 74.3
26 MiniMax M2.5 MiniMax 74.0
27 Nova Premier Amazon 73.3
28 Gemma 4 E4B Google DeepMind 69.4
29 Reka Flash 3.1 Reka 66.9
30 Reka Flash 3 Reka 65.0
31 Gemma 4 E2B Google DeepMind 60.0
32 Jamba 1.7 Large AI21 Labs 57.7