TTFT (time-to-first-token)
How long after a request the first response token arrives.
How long after you send a request the first response token arrives. Dominated by prefill latency on long inputs. For UX-critical traffic (chat) this matters more than total throughput.