Phase 15 — Interview Questions: Disaggregated Serving
Staff/principal-level questions on this topic. Cover the answer, attempt it OUT LOUD, then compare. (See CAREER.md for how to run a full mock loop.)
Q1. Why disaggregate prefill and decode?
Model answer
They have different resource profiles and interfere when co-located: a big prefill stalls ongoing decodes (latency spikes). Splitting them lets you scale and tune each fleet independently — more compute for prefill TTFT, more memory-bandwidth/instances for decode throughput — at the cost of transferring the KV cache between them.
Q2. What's the main cost/risk of disaggregation?
Model answer
Shipping the KV cache over the network adds latency and bandwidth pressure; it only pays off when interference savings exceed transfer cost. It also adds routing/orchestration complexity and failure modes (a decode node waiting on remote KV).
Going deeper
The flagship phases (02, 03) show the depth and number of questions to expect for a topic you claim as your specialty.