Public, reproducible benchmark of CLI coding agents on SWE-bench Verified.
| Agent | Model | Pass rate | # tasks | Skipped | Median cost | Median time | Median tokens (in+out) | Last run |
|---|---|---|---|---|---|---|---|---|
| claude-code | glm-5.1 | 75% | 4 | 0 | $0.59 | 186s | 38287 | 2026-06-08T05:23:02Z |
| codex | MiniMax-M3 | 25% | 4 | 0 | $? | 325s | 752198 | 2026-06-08T05:47:13Z |