← HomeLogin
From-scratch PyTorch implementation of Google's TurboQuant
~ai.algorithms~ai.llms~dev.sourcepython
github.com 4 days ago

Summary

A from-scratch PyTorch implementation of TurboQuant (ICLR 2026), Google's two-stage vector quantization algorithm for compressing LLM key-value caches.

We implemented the algorithm from the paper, validated it on synthetic vectors, then tested it against a real model's KV cache (Qwen2.5-3B-Instruct on an RTX 3060) to verify the compression and accuracy claims.