TurboQuant
TurboQuant is a vector quantization paper that shows how to compress high-dimensional vectors very aggressively while keeping either reconstruction error or inner-product error near the theoretical limit. The core idea is: randomly rotate the vector first, then quantize each coordinate independently; for inner products, TurboQuant adds a 1-bit residual correction step to remove bias.[1] What problem it solves The paper targets three important settings: large language model KV-cache compression, vector database search, and general online quantization of embeddings. In all three, the goal is to store vectors in fewer bits without destroying geometry, especially norms and dot products.[1] ...