tok/s (throughput)
Tokens generated per second after the first one.
Tokens generated per second after the first one. Single-stream numbers (one user) differ a lot from batched numbers (many concurrent users) — modern serving stacks like vLLM achieve 5–10× higher aggregate throughput with continuous batching.