Meta's REFRAG Claims 30x Faster RAG by Filtering Context Before It Hits the LLM

A new technique from Meta called REFRAG reportedly compresses and filters retrieved context before passing it to a language model, achieving a 30.85x speed improvement while outperforming baseline LLaMA on accuracy.

Subscribe to unlock all stories

Get full access to The Singularity Ledger, archive included.