10M Vectors. 4GB RAM. Zero Training. Meet turbovec
Vector search shouldn’t cost you 30GB of RAM, a separate training phase, or recall hits when you add filters. If you’re building RAG systems that care about memory, latency, or privacy, FAISS is starting to feel heavy.
turbovec
turbovec is a Rust-native vector index with Python bindings that compresses embeddings 8–16x, skips the train step entirely, and consistently beats FAISS on search speed.
Why it’s different
- Faster than FAISS: Hand-tuned NEON & AVX-512 kernels beat
IndexPQFastScanby 12–20% on ARM, and match or beat it on x86. - 31GB → 4GB: A 10M-vector corpus shrinks from
float32to ~4GB with 4-bit quantization. No recall cliff. - Zero training, online ingest: Add vectors anytime. No codebook training, no rebuilds, no parameter tuning. The index grows with your data.
- Native filtering at search time: Pass an
allowlistto.search()and the SIMD kernel skips disallowed blocks before scoring. No over-fetching. No recall penalty. - 100% local: No managed service. No telemetry. Pair with any open embedding model for a fully air-gapped RAG stack.
How to use it
pip install turbovec
from turbovec import TurboQuantIndex
# No train phase. Just index.
index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
index.add(more_vectors) # online ingest
scores, ids = index.search(query, k=10)
# Need stable external IDs & O(1) deletes?
from turbovec import IdMapIndex
idx = IdMapIndex(dim=1536, bit_width=4)
idx.add_with_ids(vectors, ids)
idx.search(query, k=10, allowlist=sql_candidate_ids) # hybrid retrieval
Drop into your stack
Swap your in-memory vector store in one line. Same API, same pipeline wiring:
pip install turbovec[langchain]pip install turbovec[llama-index]pip install turbovec[haystack]pip install turbovec[agno]

How it works (in a sentence)
turbovec runs Google Research’s TurboQuant algorithm: normalize → apply a fixed random rotation (making coordinate distributions predictable) → precomputed Lloyd-Max scalar quantization → length-renormalized scoring. The math replaces codebook training. SIMD replaces decompression. Result: distortion within 2.7x of Shannon’s lower bound, with zero data dependency.
If your vector index is eating RAM, slowing down under filters, or demanding a train phase you don’t need, downgrade the footprint and upgrade the speed.
pip install turbovec and ship it. 🔍💨
