Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 15 — Cheatsheet: Disaggregated Serving

  • Prefill fleet (compute) -> KV transfer -> decode fleet (bandwidth). Tune each separately.
  • KV connectors abstract the transfer (also used for offloading / cross-engine cache).
  • Scheduler state WAITING_FOR_REMOTE_KVS gates decode until KV arrives.

Key upstream files

  • vllm/distributed/kv_transfer/
  • vllm/distributed/kv_transfer/kv_connector/v1/
  • vllm/v1/core/sched/scheduler.py
  • examples/

Full reference: 00-guide.md · 01-deep-dive.md