Skip to content
Coding

SWE-bench-Pro

Professional version of SWE-bench with more complex issues.

15 models published a score
# Model Company Score
1 Claude Mythos Preview Anthropic 77.8
2 MiMo V2.5 Xiaomi 76.0
3 Claude Opus 4.7 Anthropic 64.3
4 GPT-5.5 OpenAI 58.6
5 Kimi K2.6 Moonshot AI 58.6
6 GLM-5.1 Zhipu AI 58.4
7 GPT-5.4 OpenAI 57.7
8 GPT-5.3-Codex OpenAI 56.8
9 MiniMax M2.7 MiniMax 56.2
10 GPT-5.2 OpenAI 55.6
11 DeepSeek V4 Pro DeepSeek 55.4
12 Qwen3.6-27B Alibaba 53.5
13 Kimi K2.5 Moonshot AI 50.7
14 Qwen3.6-35B-A3B Alibaba 49.5
15 Qwen3-Coder-Next Alibaba 44.3