Glossary

Batched serving

Running multiple inference requests through the same GPU forward pass.

Running multiple inference requests through the same GPU forward pass. Modern stacks (vLLM, TGI, SGLang) do continuous batching — requests join and leave the batch as they arrive. This is why a self-hosted GPU can serve 50–100 concurrent users at the same per-token cost as one.

Keyboard shortcuts

?: Show this overlay
/: Focus the first form field
g h: Go to / (home)
g b: Go to /best-llm-for
g c: Go to /cost
g s: Go to /self-hosted
g x: Go to /compliance
Esc: Close any overlay

Inspired by Linear and GitHub conventions. The two-key sequences (g then h) work within ~1 second.