Phase 15 — Cheatsheet: Disaggregated Serving
- Prefill fleet (compute) -> KV transfer -> decode fleet (bandwidth). Tune each separately.
- KV connectors abstract the transfer (also used for offloading / cross-engine cache).
- Scheduler state WAITING_FOR_REMOTE_KVS gates decode until KV arrives.
Key upstream files
vllm/distributed/kv_transfer/vllm/distributed/kv_transfer/kv_connector/v1/vllm/v1/core/sched/scheduler.pyexamples/
Full reference: 00-guide.md · 01-deep-dive.md