TurboQuant WASM: Compress Vector Indexes 6x and Search Directly in the Browser

⬅️ Back to Tools

TurboQuant WASM: Vector Compression at the Edge

Embedding indexes are memory hogs. One million 384-dimensional float32 vectors weigh 1.5 GB. On mobile devices, that is minutes of download time and a significant chunk of RAM. TurboQuant WASM shrinks them to ~240 MB a 6x compression and lets you search directly on the compressed data without ever decompressing it first.

Built on the Google Research paper “TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate” (ICLR 2026), this package wraps a Zig → WASM build with relaxed SIMD and optional WebGPU acceleration into a ~12 kB gzipped npm module that runs entirely in the browser or Node.js. No server. No training step. No dataset-dependent configuration.


What It Does

🗜️ 6x Compression, No Training

Unlike Product Quantization (PQ/OPQ) methods that require a training pass over your dataset, TurboQuant is online: initialize with dim and seed, then encode any vector immediately. Each compressed vector is self-contained, making it ideal for streaming data, LLM KV caches, and real-time indexing where you cannot pause to build a codebook.

ScenarioRaw Float32TurboQuantSavings
1M × 384-dim vectors1.5 GB~240 MB~6x

⚡ Search Without Decompression

TurboQuant preserves inner products well enough for approximate search. You can run dot() on a single compressed vector, or dotBatch() across an entire index. The batch call automatically detects WebGPU and dispatches a compute shader that scores compressed vectors directly on the GPU. No decompression step, no float32 round-trip.

🔮 Two Substrates, One Algorithm

The library ships two implementations of the same TurboQuant math:

  • WASM SIMD the turboquant-wasm npm package. CPU path for vector search, image similarity, and 3D Gaussian Splatting compression.
  • WGSL Compute Shaders a GPU-native reimplementation for workloads that need real-time throughput. The live Prompt → Diagram demo and the in-browser Gemma 4 E2B LLM both run the algorithm on the GPU so the KV cache stays compressed during inference.

Quick Start

import { TurboQuant } from "turboquant-wasm";

const tq = await TurboQuant.init({ dim: 1024, seed: 42 });

// Compress a vector (~4.5 bits/dim)
const compressed = tq.encode(myFloat32Array); // Uint8Array

// Decode back when you need the original
const decoded = tq.decode(compressed); // Float32Array

// Fast dot product without decoding
const score = tq.dot(queryVector, compressed);

// Batch search across an index
const scores = await tq.dotBatch(
  queryVector,
  allCompressed, // concatenated Uint8Array
  bytesPerVector,
);

tq.destroy();

dotBatch() prefers WebGPU when available (Chrome/Edge 113+), and falls back transparently to WASM SIMD on devices without GPU support.


API

class TurboQuant {
  static async init(config: { dim: number; seed: number }): Promise<TurboQuant>;
  encode(vector: Float32Array): Uint8Array;
  decode(compressed: Uint8Array): Float32Array;
  dot(query: Float32Array, compressed: Uint8Array): number;
  dotBatch(
    query: Float32Array,
    compressedConcat: Uint8Array,
    bytesPerVector: number,
  ): Promise<Float32Array>;
  rotateQuery(query: Float32Array): Float32Array;
  destroy(): void;
}
  • encode / decode single-vector compression and reconstruction.
  • dot scalar dot product between a float32 query and one compressed vector.
  • dotBatch scores a query against many compressed vectors. Auto-detects WebGPU.
  • rotateQuery pre-rotates a query for faster repeated batch scoring.
  • destroy releases WASM memory.

Browser Requirements

The WASM binary uses relaxed SIMD instructions. Supported runtimes:

RuntimeMinimum Version
Chrome / Edge114+
Firefox128+
Safari18+
Node.js20+

WebGPU batch scoring requires Chrome/Edge 113+.


When to Use TurboQuant (and When Not To)

TurboQuantPQ / OPQ (FAISS, ScaNN)
Compression~4.5 bits/dim (~6x)~1–2 bits/dim (16–32x)
Query speedSlower (float decode per pair)Faster (integer codebook lookup)
TrainingNone encode any vector immediatelyRequired must train on dataset
Streaming dataYes each vector is self-containedDegrades if distribution shifts
Deploymentnpm install + 3 lines of codeDataset-dependent configuration
Size~12 kB gzippedUsually much larger

Use TurboQuant when vectors arrive continuously (LLM KV cache, real-time indexing), you cannot afford a training step, you need simple browser or edge deployment, or you want a dependency-free npm package.

Use PQ/OPQ when you have a static dataset, can train offline, and need the absolute fastest queries with maximum compression.


Live Demos


Quality Guarantees

  • Bit-identical output with the reference Zig implementation for the same input + seed.
  • MSE decreases as dimension increases (verified on unit vectors).
  • Dot product preservation mean absolute error < 1.0 for unit vectors at dim=128.
  • Golden-value tests confirm correctness across encode, decode, and scoring paths.

Installation

npm install turboquant-wasm

No additional build tools or native dependencies are required at install time. The WASM binary is embedded in the package.

Building from source (if you want to hack on the Zig implementation):

# Run Zig tests
zig test -target aarch64-macos src/turboquant.zig

# Full npm build (Zig → wasm-opt → base64 embed → bundle + tsc)
bun run build

# WASM only
bun run build:zig

Requires Zig 0.15.2 and Bun.


Links


Why This Tool Rocks

  • Tiny footprint ~12 kB gzipped. Smaller than most image assets.
  • No training encode vectors as they arrive. Perfect for streaming and LLM caches.
  • Browser-native runs in Chrome, Firefox, Safari, and Node.js with no server round-trips.
  • GPU-accelerated WebGPU batch scoring when available; WASM SIMD fallback when not.
  • Near-optimal distortion backed by peer-reviewed Google Research with proven quality bounds.
  • Open source MIT licensed, with bit-identical verification against the reference Zig code.
  • Dual substrate the same algorithm in WASM for CPU and WGSL for GPU, so you can choose the right hardware path for your workload.

Crepi il lupo! 🐺