CompactRAG
CompactRAG Research summary — May 29, 2026. Source: arXiv:2602.05728 (appearing at WWW ‘26) The Problem It Solves Standard iterative RAG for multi-hop questions (e.g., “Who directed the film starring the actor born in the same city as Marie Curie?”) works by alternating between retrieval and LLM reasoning at each hop. This means: N hops = N LLM calls for reasoning, plus retrieval at each step High token overhead — each call re-reads growing context Entity drift — the entity being tracked across hops can get corrupted or lost as queries are reformulated each step The Core Idea: Decouple Offline and Online CompactRAG does most of the work once, offline, rather than at query time. ...