L LLM Cloud Hub
Glossary

Quantization (bf16, fp8, awq-int4, gguf-q4 …)

Compressing model weights to fewer bits per parameter to fit on smaller GPUs.

Compressing model weights to fewer bits per parameter. A 70B model in bf16 needs ~140GB VRAM; in fp8 ~70GB; in awq-int4 ~35GB. Quality degrades slightly with each step down — fp8 is nearly indistinguishable from bf16; int4 introduces measurable but usually acceptable quality loss for most use cases.

See also
← Back to full glossary

Keyboard shortcuts

?
Show this overlay
/
Focus the first form field
g h
Go to / (home)
g b
Go to /best-llm-for
g c
Go to /cost
g s
Go to /self-hosted
g x
Go to /compliance
Esc
Close any overlay

Inspired by Linear and GitHub conventions. The two-key sequences (g then h) work within ~1 second.