Every Frontier LLM Scores 0% on ProgramBench, a New Test of Whole-Program Synthesis

The team behind SWE-Bench dropped a new benchmark that asks models to recreate real software like SQLite and FFmpeg from scratch — and no model can do it at all.

Subscribe to unlock all stories

Get full access to The Singularity Ledger, archive included.

Cancel anytime. Payments powered by Stripe.