Skip to content
Knowledge

SimpleQA

Short-answer factuality benchmark.

6 models published a score
# Model Company Score
1 Nova Premier Amazon 86.3
2 Gemini 3.1 Pro Google DeepMind 72.1
3 Gemini 3 Pro Google DeepMind 72.1
4 GPT-5.2 OpenAI 58.0
5 Step 3.5 Flash StepFun 31.6
6 Mistral Large 3 Mistral AI 23.8