ARC-AGI-3 Benchmark Drops and Annihilates Every Frontier Model — Grok Scores Literally Zero
The newest version of the abstract reasoning benchmark designed to test genuine intelligence gives top LLMs less than 1% accuracy while humans score 100%. Grok 4.20 scored 0.00%.
Subscribe to unlock all stories
Get full access to The Singularity Ledger, archive included.
Cancel anytime. Payments powered by Stripe.