Phase 16 — Cheatsheet: Serving APIs & Parsers
- vllm serve -> FastAPI -> serving_chat -> AsyncLLM. Speaks OpenAI + Anthropic + gRPC.
- Chat template turns messages -> prompt tokens. SSE for streaming deltas.
- Tool/reasoning parsers are pluggable and per-model; streaming makes them partial-parse.
Key upstream files
vllm/entrypoints/openai/api_server.pyvllm/entrypoints/openai/serving_chat.pyvllm/entrypoints/openai/tool_parsers/vllm/entrypoints/openai/reasoning_parsers/vllm/entrypoints/
Full reference: 00-guide.md · 01-deep-dive.md