Reasoning

MMLU

Massive Multitask Language Understanding - 57 materias academicas, ~16K preguntas.

18 modelos publicaron score

#	Modelo	Empresa	Score
1	Claude Opus 4.6	Anthropic	92.1
2	Gemini 3 Pro	Google DeepMind	91.8
3	GPT-5.2	OpenAI	91.4
4	Gemini 3.1 Pro	Google DeepMind	91.4
5	DeepSeek R1 0528	DeepSeek	90.8
6	Nova Premier	Amazon	87.4
7	Grok 4	xAI	86.6
8	Nemotron 3 Super	Nvidia	86.0
9	Nova Pro	Amazon	85.9
10	Llama 4 Maverick	Meta	85.5
11	Mistral Large 3	Mistral AI	85.5
12	Command A	Cohere	85.5
13	MiniMax M2.5	MiniMax	82.0
14	Nova Lite	Amazon	80.5
15	AFM Server	Apple	80.0
16	Llama 4 Scout	Meta	79.6
17	Yi-Lightning	01.AI	76.0
18	AFM On-Device	Apple	67.9