Phase 15 — Deep Dive: Disaggregated Serving
Read this with
upstream/open. Every path is relative toupstream/at the pinned commitv0.22.1 @ 0decac0(UPSTREAM_PIN.md). If a line number ever drifts, search for the named symbol instead.
Contents
Guided reading list
Work through these in order. This is a scaffold: the reading targets and the questions are real; fill in the line-by-line annotations as you go (this is exactly the muscle a maintainer uses — reading unfamiliar code and extracting its contract).
vllm/distributed/kv_transfer/— The KV connector framework (the heart of disagg).- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/distributed/kv_transfer/kv_connector/v1/— V1 connectors (base + implementations).- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/v1/core/sched/scheduler.py— Search 'connector' / 'WAITING_FOR_REMOTE_KVS' to see async KV load.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
examples/— Look for disaggregated-prefill example scripts/configs.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
Questions to answer as you read
- Why co-locating prefill+decode causes interference (prefill stalls decodes)?
- Prefill node -> KV transfer -> decode node; the request handoff?
- KV connectors: the transfer abstraction (NIXL, shared storage, etc.)?
- Encode disaggregation for multimodal?
- Routing / proxy between P and D fleets; load balancing?
Cross-references
- Intuition: 00-guide.md
- Build it yourself: 02-mini-build.md
- The gold-standard depth to emulate: Phase 02 deep-dive.