Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 16 — Cheatsheet: Serving APIs & Parsers

  • vllm serve -> FastAPI -> serving_chat -> AsyncLLM. Speaks OpenAI + Anthropic + gRPC.
  • Chat template turns messages -> prompt tokens. SSE for streaming deltas.
  • Tool/reasoning parsers are pluggable and per-model; streaming makes them partial-parse.

Key upstream files

  • vllm/entrypoints/openai/api_server.py
  • vllm/entrypoints/openai/serving_chat.py
  • vllm/entrypoints/openai/tool_parsers/
  • vllm/entrypoints/openai/reasoning_parsers/
  • vllm/entrypoints/

Full reference: 00-guide.md · 01-deep-dive.md