Phase 18 — Deep Dive: Performance Engineering

Read this with upstream/ open. Every path is relative to upstream/ at the pinned commit v0.22.1 @ 0decac0 (UPSTREAM_PIN.md). If a line number ever drifts, search for the named symbol instead.

Guided reading list
Questions to answer as you read
Cross-references

Guided reading list

Work through these in order. This is a scaffold: the reading targets and the questions are real; fill in the line-by-line annotations as you go (this is exactly the muscle a maintainer uses — reading unfamiliar code and extracting its contract).

benchmarks/ — The benchmark suite (throughput, latency, serving).
- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/benchmarks/ — The 'vllm bench' implementation.
- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/v1/metrics/ — The metrics/stats the engine exposes (Prometheus + logging).
- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/v1/metrics/stats.py — SchedulerStats / IterationStats: what's measured each step.
- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/config/scheduler.py — The tuning knobs and their defaults/semantics.
- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.

Questions to answer as you read

Metrics that matter: throughput (tok/s), TTFT, ITL/TPOT, goodput, latency percentiles?
Little's Law and how batch size, arrival rate, and latency relate?
The roofline model: compute-bound vs memory-bound; arithmetic intensity?
Profiling: the torch profiler, Nsight Systems, and vLLM's own metrics?
The knobs: max_num_seqs, max_num_batched_tokens, gpu_memory_utilization, enable_chunked_prefill, CUDA graphs, quant, spec decode?
Benchmarking properly: vllm bench, warmup, steady state, fair comparisons?

Cross-references

Intuition: 00-guide.md
Build it yourself: 02-mini-build.md
The gold-standard depth to emulate: Phase 02 deep-dive.

vLLM Mastery — From Zero to Maintainer

Phase 18 — Deep Dive: Performance Engineering

Contents

Guided reading list

Questions to answer as you read

Cross-references

Keyboard shortcuts

vLLM Mastery — From Zero to Maintainer

Phase 18 — Deep Dive: Performance Engineering

Contents

Guided reading list

Questions to answer as you read

Cross-references