Quantization

Compressing RAG Embeddings with TurboQuant

TurboQuant compresses embeddings aggressively without corpus-specific training. This post covers the algorithm, the turboquant-embed implementation, and the retrieval benchmarks that hold up on BeIR.

From QJL to TurboQuant: Data-Oblivious Vector Quantization

TurboQuant achieves near-optimal vector quantization without seeing the data. This post traces the full theory, from random projections and 1-bit quantized JL transforms through polar decompositions to the final distortion bounds, with complete proofs.