Token
A chunk of text the model reads or writes. Pricing is denominated in tokens.
A "chunk" of text the model reads or writes. English averages roughly 1 token ≈ 4 characters or ¾ of a word. Pricing is almost universally expressed per million tokens.
-
Input vs. output tokensInput tokens are what you send to the model; output tokens are what it generates back.
-
Context windowThe maximum number of tokens an LLM can process in a single request.
-
Prompt caching / cache hit %A discount providers give for repeated prefix tokens. Common in RAG and chatbots.