RyanCodrai / turbovec
Summary
turbovec is a Rust-based vector index with Python bindings, implementing Google Research's TurboQuant algorithm for efficient vector search. This data-oblivious quantizer achieves significant memory compression and faster search speeds compared to FAISS. For instance, it can store a 10 million document corpus in 4 GB of RAM, a 7.75x reduction from 31 GB (float32), while outperforming FAISS IndexPQFastScan by 12–20% on ARM and matching or exceeding its speed on x86. Key features include online ingest without training steps, efficient search-time filtering, and pure local operation for air-gapped RAG stacks. Benchmarks show TurboQuant beating FAISS IndexPQ by 0.4–3.4 points at R@1 for OpenAI d=1536 and d=3072 embeddings at 2-bit and 4-bit quantization. It also offers drop-in replacements for vector stores in LangChain, LlamaIndex, Haystack, and Agno.
Key takeaway
For AI Engineers building RAG systems where memory footprint, search latency, or data privacy are critical, "turbovec" offers a compelling alternative to traditional vector indexes. You should consider integrating this Rust-based solution to achieve significant memory savings, such as reducing a 31 GB corpus to 4 GB, and faster query performance than FAISS, especially on ARM architectures. Its local-only operation also enables fully air-gapped RAG stacks, simplifying compliance for sensitive applications.
Key insights
TurboQuant provides a data-oblivious vector quantization method for fast, memory-efficient, and accurate approximate nearest neighbor search.
Principles
- Data-oblivious quantization eliminates training phases and parameter tuning.
- Random rotation transforms vector coordinates into a predictable Beta distribution.
- Length-renormalization corrects inner product bias from scalar quantization.
Method
Vectors are normalized, randomly rotated, calibrated per-coordinate, quantized via Lloyd-Max, bit-packed, and scored with length-renormalization.
In practice
- Utilize "turbovec" as a drop-in replacement for vector stores in major RAG frameworks.
- Employ "IdMapIndex" for managing vectors with stable external IDs and O(1) deletions.
- Implement filtered search using allowlists for hybrid retrieval scenarios.
Topics
- Vector Search
- TurboQuant
- Quantization
- RAG Systems
- Memory Optimization
- Approximate Nearest Neighbor
Code references
Best for: MLOps Engineer, NLP Engineer, AI Architect, Machine Learning Engineer, AI Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Github Trending: All languages.