FB Frontier Benchmarks AI

Grok 4

Released 2025-07 · reasoning · 256K tokens · 8 benchmarks

Editorial notes

Lanzado julio 2025. La variante Heavy alcanza 50.7% en HLE con tools. Endpoint API grok-4-0709 retirado 2026-05-15 por xAI. Reemplazado por grok-4.3.

Spec sheet

Empresa: xAI
Pais: US
Tipo: reasoning
Release: 2025-07
Context: 256K tokens
Pricing (xai): $3/$15/M
Slug: grok-4

Benchmarks (8)

Reasoning 4

GPQA-Diamond

Graduate-level Physics, Chemistry, Biology - preguntas de nivel doctoral.

88.0
MMLU-Pro

MMLU mejorado con preguntas mas dificiles y 10 opciones de respuesta.

87.0
MMLU

Massive Multitask Language Understanding - 57 materias academicas, ~16K pregunta

86.6
Humanitys-Last-Exam

El benchmark mas dificil conocido - problemas academicos novedosos.

25.4

Coding 3

Math 1

AIME-2024

American Invitational Mathematics Examination 2024.

94.0

Cite this model

BibTeX · APA

BibTeX

@misc{frontier-grok-4,
  title  = {Grok 4},
  author = {{xAI}},
  year   = {2025},
  note   = {Frontier Benchmarks AI atlas. Accessed 2026-05-08},
  url    = {https://frontierbenchmarks.com/models/grok-4}
}

APA

xAI (2025). Grok 4 [Large language model]. Frontier Benchmarks AI. Retrieved 2026-05-08, from https://frontierbenchmarks.com/models/grok-4

Citation refleja la pagina del atlas, no el paper original del modelo. Para el paper, ve a la seccion "Recursos" arriba.

⚔️ Battle vs otro modelo ← Todos los modelos Mas de xAI Methodology