Phase 16 — Deep Dive: Serving APIs & Parsers
Read this with
upstream/open. Every path is relative toupstream/at the pinned commitv0.22.1 @ 0decac0(UPSTREAM_PIN.md). If a line number ever drifts, search for the named symbol instead.
Contents
Guided reading list
Work through these in order. This is a scaffold: the reading targets and the questions are real; fill in the line-by-line annotations as you go (this is exactly the muscle a maintainer uses — reading unfamiliar code and extracting its contract).
vllm/entrypoints/openai/api_server.py— The FastAPI app + routes.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/entrypoints/openai/serving_chat.py— Chat completions: templating, streaming, tools.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/entrypoints/openai/tool_parsers/— Per-model tool-call parsers (the pluggable bit).- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/entrypoints/openai/reasoning_parsers/— Reasoning/think-tag parsers.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/entrypoints/— Look for the Anthropic Messages + gRPC entrypoints.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
Questions to answer as you read
- The OpenAI-compatible server: /v1/chat/completions, /v1/completions, /v1/embeddings?
- Chat templates and how messages become a token prompt?
- Streaming via Server-Sent Events; delta semantics?
- Tool/function calling: schema in, tool_calls out; the tool-call parsers?
- Reasoning parsers (separating chain-of-thought from the answer)?
- Anthropic Messages API and gRPC front-ends?
Cross-references
- Intuition: 00-guide.md
- Build it yourself: 02-mini-build.md
- The gold-standard depth to emulate: Phase 02 deep-dive.