Glossary

Always-on vs. inference-only

Dedicated 24/7 GPU billing vs. scale-to-zero serverless inference.

Always-on = a dedicated GPU rented 24/7 (720 h × $/hr). Inference-only = scale-to-zero serverless billing — RunPod Serverless, Modal, etc. — where you pay only for actual compute time. Always-on is the floor at high utilization; inference-only is the floor at low utilization.

Keyboard shortcuts

?: Show this overlay
/: Focus the first form field
g h: Go to / (home)
g b: Go to /best-llm-for
g c: Go to /cost
g s: Go to /self-hosted
g x: Go to /compliance
Esc: Close any overlay

Inspired by Linear and GitHub conventions. The two-key sequences (g then h) work within ~1 second.