Skip to content
Coding

Terminal-Bench-Hard

Hard terminal/CLI tasks.

1 models published a score
# Model Company Score
1 Claude Opus 4.5 Anthropic 44.0