Saltar al contenido
Reasoning

MMLU

Massive Multitask Language Understanding - 57 materias academicas, ~16K preguntas.

18 modelos publicaron score
# Modelo Empresa Score
1 Claude Opus 4.6 Anthropic 92.1
2 Gemini 3 Pro Google DeepMind 91.8
3 GPT-5.2 OpenAI 91.4
4 Gemini 3.1 Pro Google DeepMind 91.4
5 DeepSeek R1 0528 DeepSeek 90.8
6 Nova Premier Amazon 87.4
7 Grok 4 xAI 86.6
8 Nemotron 3 Super Nvidia 86.0
9 Nova Pro Amazon 85.9
10 Llama 4 Maverick Meta 85.5
11 Mistral Large 3 Mistral AI 85.5
12 Command A Cohere 85.5
13 MiniMax M2.5 MiniMax 82.0
14 Nova Lite Amazon 80.5
15 AFM Server Apple 80.0
16 Llama 4 Scout Meta 79.6
17 Yi-Lightning 01.AI 76.0
18 AFM On-Device Apple 67.9