10M Vectors. 4GB RAM. Zero Training. Meet turbovec

Hazem Abbas

08 Jun 2026 — 3 min read

Vector search shouldn’t cost you 30GB of RAM, a separate training phase, or recall hits when you add filters. If you’re building RAG systems that care about memory, latency, or privacy, FAISS is starting to feel heavy.

turbovec

turbovec is a Rust-native vector index with Python bindings that compresses embeddings 8–16x, skips the train step entirely, and consistently beats FAISS on search speed.

Why it’s different

Faster than FAISS: Hand-tuned NEON & AVX-512 kernels beat IndexPQFastScan by 12–20% on ARM, and match or beat it on x86.
31GB → 4GB: A 10M-vector corpus shrinks from float32 to ~4GB with 4-bit quantization. No recall cliff.
Zero training, online ingest: Add vectors anytime. No codebook training, no rebuilds, no parameter tuning. The index grows with your data.
Native filtering at search time: Pass an allowlist to .search() and the SIMD kernel skips disallowed blocks before scoring. No over-fetching. No recall penalty.
100% local: No managed service. No telemetry. Pair with any open embedding model for a fully air-gapped RAG stack.

How to use it

pip install turbovec

from turbovec import TurboQuantIndex

# No train phase. Just index.
index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
index.add(more_vectors)  # online ingest

scores, ids = index.search(query, k=10)

# Need stable external IDs & O(1) deletes?
from turbovec import IdMapIndex
idx = IdMapIndex(dim=1536, bit_width=4)
idx.add_with_ids(vectors, ids)
idx.search(query, k=10, allowlist=sql_candidate_ids)  # hybrid retrieval

Drop into your stack

Swap your in-memory vector store in one line. Same API, same pipeline wiring:

pip install turbovec[langchain]
pip install turbovec[llama-index]
pip install turbovec[haystack]
pip install turbovec[agno]

How it works (in a sentence)

turbovec runs Google Research’s TurboQuant algorithm: normalize → apply a fixed random rotation (making coordinate distributions predictable) → precomputed Lloyd-Max scalar quantization → length-renormalized scoring. The math replaces codebook training. SIMD replaces decompression. Result: distortion within 2.7x of Shannon’s lower bound, with zero data dependency.

If your vector index is eating RAM, slowing down under filters, or demanding a train phase you don’t need, downgrade the footprint and upgrade the speed.

pip install turbovec and ship it. 🔍💨

In the Age of AI, Are You Still a Developer? No We are Conductors

Let's skip the corporate AI hype. We all know the drill by now: someone on LinkedIn posts a shiny video of an AI generating a full-stack todo app in twelve seconds, and the comments section immediately descends into a tribal war between "software engineering is dead"

Iop Program Arizona Insurance vs Traditional Health Insurance: Which Covers Mental Health Treatment?

In recent years, mental health treatment has gained increasing attention, but coverage can vary significantly depending on your insurance plan. For those considering intensive outpatient programs (IOP) in Arizona, understanding how Iop Program Arizona Insurance compares to traditional health insurance is crucial. This article will delve into the specifics of

The End of AI Amnesia: Why Memory is the Ultimate Upgrade

Every time you open a fresh prompt window, you are looking at a brilliant mind with total amnesia. It can instantly dissect a complex system architecture or pull apart the meter of a poem, but it has absolutely no idea who you are, what you were building yesterday, or where

Odysseus: Your Private AI Workspace, Self-Hosted and Totally Free

What is Odysseus? Odysseus is a local-first AI workspace that runs entirely on your hardware. No telemetry, no subscriptions just you and your models. Think of it as the ultimate command center for local LLMs. It goes beyond simple chat to include autonomous agents, deep research synthesis, email assistance via