Phase 16 — Exercises: Serving APIs & Parsers
Work these after the labs. They escalate from "explain it" to "design it" — staff-level means you can do the last ones cold.
- Why are streaming tool-call parsers hard (partial JSON across deltas)?
- How does a chat template turn messages into a single token sequence?
- What must be true for vLLM to be a drop-in OpenAI replacement?
Self-grading
For each: could you (a) explain it to a teammate in 2 minutes, and (b) point to the exact
upstream/ file that proves your answer? If not, re-read the matching anchor in
01-deep-dive.md.