Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 15 — Deep Dive: Disaggregated Serving

Read this with upstream/ open. Every path is relative to upstream/ at the pinned commit v0.22.1 @ 0decac0 (UPSTREAM_PIN.md). If a line number ever drifts, search for the named symbol instead.

Contents


Guided reading list

Work through these in order. This is a scaffold: the reading targets and the questions are real; fill in the line-by-line annotations as you go (this is exactly the muscle a maintainer uses — reading unfamiliar code and extracting its contract).

  1. vllm/distributed/kv_transfer/ — The KV connector framework (the heart of disagg).
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  2. vllm/distributed/kv_transfer/kv_connector/v1/ — V1 connectors (base + implementations).
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  3. vllm/v1/core/sched/scheduler.py — Search 'connector' / 'WAITING_FOR_REMOTE_KVS' to see async KV load.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  4. examples/ — Look for disaggregated-prefill example scripts/configs.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.

Questions to answer as you read

  • Why co-locating prefill+decode causes interference (prefill stalls decodes)?
  • Prefill node -> KV transfer -> decode node; the request handoff?
  • KV connectors: the transfer abstraction (NIXL, shared storage, etc.)?
  • Encode disaggregation for multimodal?
  • Routing / proxy between P and D fleets; load balancing?

Cross-references