Saltar al contenido
FB
Frontier Benchmarks AI
Modelos
Wizard
Battle
Hardware
Pricing
Methodology
Descargar
Buscar
/
EN
ES
Home
Modelos
Wizard
Battle
Hardware
Pricing
Methodology
Descargar
home
/
benchmarks
/
GPQA-Diamond
Reasoning
GPQA-Diamond
Graduate-level Physics, Chemistry, Biology - preguntas de nivel doctoral.
54 modelos publicaron score
#
Modelo
Empresa
Score
1
Claude Mythos Preview
Anthropic
94.6
2
GPT-5.4 Pro
OpenAI
94.4
3
Gemini 3.1 Pro
Google DeepMind
94.3
4
Claude Opus 4.7
Anthropic
94.2
5
GPT-5.5
OpenAI
93.6
6
GPT-5.2 Pro
OpenAI
93.2
7
GPT-5.4
OpenAI
92.8
8
GPT-5.2
OpenAI
92.4
9
Gemini 3 Pro
Google DeepMind
91.9
10
Claude Opus 4.6
Anthropic
91.3
11
Kimi K2.6
Moonshot AI
90.5
12
Gemini 3 Flash
Google DeepMind
90.4
13
DeepSeek V4 Pro
DeepSeek
90.1
14
Doubao Seed 2.0 Pro
ByteDance
88.9
15
GPT-5.4 mini
OpenAI
88.0
16
Grok 4 Heavy
xAI
88.0
17
Grok 4
xAI
88.0
18
Kimi K2.5
Moonshot AI
87.6
19
Qwen3-Max-Thinking
Alibaba
87.4
20
Claude Opus 4.5
Anthropic
87.0
21
MiMo V2 Pro
Xiaomi
87.0
22
Gemini 3.1 Flash-Lite
Google DeepMind
86.9
23
Qwen3.6-35B-A3B
Alibaba
86.0
24
GLM-5
Zhipu AI
86.0
25
DeepSeek V3.2 Speciale
DeepSeek
85.7
26
MiniMax M2.5
MiniMax
85.2
27
Gemma 4 (31B dense)
Google DeepMind
84.3
28
MiMo V2 Flash
Xiaomi
83.5
29
GLM-4.6
Zhipu AI
82.9
30
GPT-5.4 nano
OpenAI
82.8
31
DeepSeek V3.2
DeepSeek
82.4
32
Gemma 4 26B-A4B
Google DeepMind
82.3
33
DeepSeek R1 0528
DeepSeek
81.0
34
EXAONE 4.5 33B
LG AI Research
80.5
35
Nova 2 Lite
Amazon
79.6
36
Nemotron 3 Super
Nvidia
79.4
37
K-EXAONE 236B-A23B
LG AI Research
79.1
38
Qwen3-Max
Alibaba
76.4
39
Magistral Medium 1.2
Mistral AI
76.3
40
Claude Sonnet 4.6
Anthropic
74.1
41
Llama 4 Behemoth
Meta
73.7
42
Step-3
StepFun
73.0
43
Nemotron 3 Nano
Nvidia
73.0
44
Mistral Small 4
Mistral AI
71.2
45
Llama 4 Maverick
Meta
69.8
46
Gemma 4 E4B
Google DeepMind
58.6
47
Llama 4 Scout
Meta
57.2
48
Reka Flash 3
Reka
52.9
49
Yi-Lightning
01.AI
50.9
50
Command A
Cohere
50.8
51
Nova Pro
Amazon
46.9
52
Mistral Large 3
Mistral AI
43.9
53
Gemma 4 E2B
Google DeepMind
43.4
54
Jamba 1.7 Large
AI21 Labs
39.0
← Todos los benchmarks
Como medimos