xAI Ships Grok 4.20 Beta with Record-Low Hallucination Rate as Truthfulness Becomes the New Benchmark War

Grok 4.20 Beta posts a 22% hallucination rate — the lowest ever measured by Artificial Analysis — while hitting 265 tokens per second and topping instruction-following benchmarks. xAI is making a clear play: the best model isn't the smartest, it's the one that lies the least.

xAI's Grok 4.20 Beta is posting numbers that should make OpenAI and Anthropic pay attention — not because it's the most capable model on the planet, but because it may be the most honest. According to benchmarking firm Artificial Analysis (@ArtificialAnlys), the new release achieves three simultaneous improvements: a hallucination rate of just 22% (the lowest they've ever recorded for a frontier model), a top score of 82.9% on IFBench for instruction following, and inference speed of 265 tokens per second. In a landscape where hallucination has been the persistent unsolved embarrassment of large language models, those numbers represent a genuine technical achievement.

Elon Musk, never one to undersell, described the model as "extremely fast for deep analysis," adding that "Beta 3 will have many fixes and functionality gains." The framing is notable: Musk is positioning Grok not as a creative partner or reasoning engine first, but as a reliable analytical tool — the kind of thing an enterprise customer might actually trust with consequential decisions.

Get our free daily newsletter

Get this article free — plus the lead story every day — delivered to your inbox.

Want every article and the full archive? Upgrade anytime.

No spam. Unsubscribe anytime.