Glossary

Prompt caching / cache hit %

A discount providers give for repeated prefix tokens. Common in RAG and chatbots.

Most major providers (OpenAI, Anthropic, Google) discount or zero-cost input tokens that have been seen recently in the same conversation. RAG and chatbot workloads commonly hit 50–80% cache rates because the system prompt + context is repeated. Set this to your realistic hit rate to get an honest cost estimate.

Keyboard shortcuts

?: Show this overlay
/: Focus the first form field
g h: Go to / (home)
g b: Go to /best-llm-for
g c: Go to /cost
g s: Go to /self-hosted
g x: Go to /compliance
Esc: Close any overlay

Inspired by Linear and GitHub conventions. The two-key sequences (g then h) work within ~1 second.