Summary
A from-scratch PyTorch implementation of TurboQuant (ICLR 2026), Google's two-stage vector quantization algorithm for compressing LLM key-value caches.
We implemented the algorithm from the paper, validated it on synthetic vectors, then tested it against a real model's KV cache (Qwen2.5-3B-Instruct on an RTX 3060) to verify the compression and accuracy claims.