Skip to content
Reasoning

GPQA-Diamond

Graduate-level Physics, Chemistry, Biology — PhD-level questions.

54 models published a score
# Model Company Score
1 Claude Mythos Preview Anthropic 94.6
2 GPT-5.4 Pro OpenAI 94.4
3 Gemini 3.1 Pro Google DeepMind 94.3
4 Claude Opus 4.7 Anthropic 94.2
5 GPT-5.5 OpenAI 93.6
6 GPT-5.2 Pro OpenAI 93.2
7 GPT-5.4 OpenAI 92.8
8 GPT-5.2 OpenAI 92.4
9 Gemini 3 Pro Google DeepMind 91.9
10 Claude Opus 4.6 Anthropic 91.3
11 Kimi K2.6 Moonshot AI 90.5
12 Gemini 3 Flash Google DeepMind 90.4
13 DeepSeek V4 Pro DeepSeek 90.1
14 Doubao Seed 2.0 Pro ByteDance 88.9
15 GPT-5.4 mini OpenAI 88.0
16 Grok 4 Heavy xAI 88.0
17 Grok 4 xAI 88.0
18 Kimi K2.5 Moonshot AI 87.6
19 Qwen3-Max-Thinking Alibaba 87.4
20 Claude Opus 4.5 Anthropic 87.0
21 MiMo V2 Pro Xiaomi 87.0
22 Gemini 3.1 Flash-Lite Google DeepMind 86.9
23 Qwen3.6-35B-A3B Alibaba 86.0
24 GLM-5 Zhipu AI 86.0
25 DeepSeek V3.2 Speciale DeepSeek 85.7
26 MiniMax M2.5 MiniMax 85.2
27 Gemma 4 (31B dense) Google DeepMind 84.3
28 MiMo V2 Flash Xiaomi 83.5
29 GLM-4.6 Zhipu AI 82.9
30 GPT-5.4 nano OpenAI 82.8
31 DeepSeek V3.2 DeepSeek 82.4
32 Gemma 4 26B-A4B Google DeepMind 82.3
33 DeepSeek R1 0528 DeepSeek 81.0
34 EXAONE 4.5 33B LG AI Research 80.5
35 Nova 2 Lite Amazon 79.6
36 Nemotron 3 Super Nvidia 79.4
37 K-EXAONE 236B-A23B LG AI Research 79.1
38 Qwen3-Max Alibaba 76.4
39 Magistral Medium 1.2 Mistral AI 76.3
40 Claude Sonnet 4.6 Anthropic 74.1
41 Llama 4 Behemoth Meta 73.7
42 Step-3 StepFun 73.0
43 Nemotron 3 Nano Nvidia 73.0
44 Mistral Small 4 Mistral AI 71.2
45 Llama 4 Maverick Meta 69.8
46 Gemma 4 E4B Google DeepMind 58.6
47 Llama 4 Scout Meta 57.2
48 Reka Flash 3 Reka 52.9
49 Yi-Lightning 01.AI 50.9
50 Command A Cohere 50.8
51 Nova Pro Amazon 46.9
52 Mistral Large 3 Mistral AI 43.9
53 Gemma 4 E2B Google DeepMind 43.4
54 Jamba 1.7 Large AI21 Labs 39.0