Saltar al contenido
Agentic

OSWorld

Computer use benchmark - tareas reales de escritorio.

9 modelos publicaron score
# Modelo Empresa Score
1 Claude Mythos Preview Anthropic 79.6
2 GPT-5.5 OpenAI 78.7
3 Claude Opus 4.7 Anthropic 78.0
4 GPT-5.4 Pro OpenAI 75.0
5 GPT-5.4 OpenAI 75.0
6 Kimi K2.6 Moonshot AI 73.1
7 Claude Opus 4.6 Anthropic 72.7
8 Claude Sonnet 4.6 Anthropic 72.5
9 GPT-5.3-Codex OpenAI 64.7