Every Frontier LLM Scores 0% on ProgramBench, a New Test of Whole-Program Synthesis
The team behind SWE-Bench dropped a new benchmark that asks models to recreate real software like SQLite and FFmpeg from scratch — and no model can do it at all.
Subscribe to unlock all stories
Get full access to The Singularity Ledger, archive included.
Cancel anytime. Payments powered by Stripe.