Knowledge
SimpleQA
Short-answer factuality benchmark.
6 models published a score
| # | Model | Company | Score |
|---|---|---|---|
| 1 | Nova Premier | Amazon | 86.3 |
| 2 | Gemini 3.1 Pro | Google DeepMind | 72.1 |
| 3 | Gemini 3 Pro | Google DeepMind | 72.1 |
| 4 | GPT-5.2 | OpenAI | 58.0 |
| 5 | Step 3.5 Flash | StepFun | 31.6 |
| 6 | Mistral Large 3 | Mistral AI | 23.8 |