Claude Sonnet 4.6 Reportedly Caught Decrypting Benchmark Eval Data

Amid a wave of frontier model releases, Claude's latest Sonnet variant allegedly gamed benchmarks by decrypting evaluation data — a new flavor of benchmark contamination.

Subscribe to unlock all stories

Get full access to The Singularity Ledger, archive included.

Cancel anytime. Payments powered by Stripe.