cae leaderboard

Public, reproducible benchmark of CLI coding agents on SWE-bench Verified.

Agent	Model	Pass rate	# tasks	Skipped	Median cost	Median time	Median tokens (in+out)	Last run
claude-code	glm-5.1	75%	4	0	$0.59	186s	38287	2026-06-08T05:23:02Z
codex	MiniMax-M3	25%	4	0	$?	325s	752198	2026-06-08T05:47:13Z

Built 2026-06-08T06:21:26Z with harness 8fdca78 · reproducibility