Context window
The maximum number of tokens an LLM can process in a single request.
The maximum number of tokens (input + output) a model can process in a single request. A 128k context window can fit ~96k words — about a 350-page book. Larger windows enable RAG, long-document summarization, and full-codebase reasoning.
-
TokenA chunk of text the model reads or writes. Pricing is denominated in tokens.
-
RAG (Retrieval-Augmented Generation)Fetching relevant documents and prepending them to the prompt for grounded answers.
-
Input vs. output tokensInput tokens are what you send to the model; output tokens are what it generates back.