Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 16 — Deep Dive: Serving APIs & Parsers

Read this with upstream/ open. Every path is relative to upstream/ at the pinned commit v0.22.1 @ 0decac0 (UPSTREAM_PIN.md). If a line number ever drifts, search for the named symbol instead.

Contents


Guided reading list

Work through these in order. This is a scaffold: the reading targets and the questions are real; fill in the line-by-line annotations as you go (this is exactly the muscle a maintainer uses — reading unfamiliar code and extracting its contract).

  1. vllm/entrypoints/openai/api_server.py — The FastAPI app + routes.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  2. vllm/entrypoints/openai/serving_chat.py — Chat completions: templating, streaming, tools.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  3. vllm/entrypoints/openai/tool_parsers/ — Per-model tool-call parsers (the pluggable bit).
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  4. vllm/entrypoints/openai/reasoning_parsers/ — Reasoning/think-tag parsers.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  5. vllm/entrypoints/ — Look for the Anthropic Messages + gRPC entrypoints.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.

Questions to answer as you read

  • The OpenAI-compatible server: /v1/chat/completions, /v1/completions, /v1/embeddings?
  • Chat templates and how messages become a token prompt?
  • Streaming via Server-Sent Events; delta semantics?
  • Tool/function calling: schema in, tool_calls out; the tool-call parsers?
  • Reasoning parsers (separating chain-of-thought from the answer)?
  • Anthropic Messages API and gRPC front-ends?

Cross-references