FB Frontier Benchmarks AI

Grok 4

Released 2025-07 · reasoning · 256K tokens · 8 benchmarks

Editorial notes

Lanzado julio 2025. La variante Heavy alcanza 50.7% en HLE con tools. Endpoint API grok-4-0709 retirado 2026-05-15 por xAI. Reemplazado por grok-4.3.

Spec sheet

Company: xAI
Country: US
Type: reasoning
Release: 2025-07
Context: 256K tokens
Pricing (xai): $3/$15/M
Slug: grok-4

Benchmarks (8)

Reasoning 4

GPQA-Diamond

Graduate-level Physics, Chemistry, Biology — PhD-level questions.

88.0
MMLU-Pro

MMLU upgraded with harder questions and 10 answer options.

87.0
MMLU

Massive Multitask Language Understanding — 57 academic subjects, ~16K questions.

86.6
Humanitys-Last-Exam

The hardest known benchmark — novel academic problems.

25.4

Coding 3

Math 1

AIME-2024

American Invitational Mathematics Examination 2024.

94.0

Cite this model

BibTeX · APA

BibTeX

@misc{frontier-grok-4,
  title  = {Grok 4},
  author = {{xAI}},
  year   = {2025},
  note   = {Frontier Benchmarks AI atlas. Accessed 2026-05-08},
  url    = {https://frontierbenchmarks.com/models/grok-4}
}

APA

xAI (2025). Grok 4 [Large language model]. Frontier Benchmarks AI. Retrieved 2026-05-08, from https://frontierbenchmarks.com/models/grok-4

Citation reflects the atlas page, not the original model paper. For the paper, see the "Resources" section above.

⚔️ Battle vs another model ← All models More from xAI Methodology