Just Think AIStart thinking

GlossaryTerm

Quantization

Storing model weights at lower precision (e.g., 4-bit) to save memory and run faster.

Quantization reduces the precision of model weights from 16- or 32-bit floats down to 8-bit, 4-bit, or even lower. A 70B model at full precision needs ~140GB of VRAM; quantized to 4-bit it fits in ~40GB and runs noticeably faster.

The trade-off is quality: aggressive quantization (3-bit, 2-bit) starts to hurt model performance, especially on hard reasoning. Sweet spots most teams settle on are 8-bit (almost no quality loss) and 4-bit (small loss, big savings). Look for GGUF, GPTQ, or AWQ formats when self-hosting.

Bring this to your business

Knowing the term is one thing. Shipping it is another.

We do two-week AI Sprints — one term, one workflow, into production by Day 10.