New Math Benchmark Caps Every Frontier Model Below 30%
Soohak, a 439-problem benchmark curated by working mathematicians, exposes stark limits in LLM reasoning — no model exceeds 50% even on its easiest subset, and top performers plateau around 26-30% on the hardest challenges.
Subscribe to unlock all stories
Get full access to The Singularity Ledger, archive included.
Cancel anytime. Payments powered by Stripe.