Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 17 — Deep Dive: Hardware Backends & Plugins

Read this with upstream/ open. Every path is relative to upstream/ at the pinned commit v0.22.1 @ 0decac0 (UPSTREAM_PIN.md). If a line number ever drifts, search for the named symbol instead.

Contents


Guided reading list

Work through these in order. This is a scaffold: the reading targets and the questions are real; fill in the line-by-line annotations as you go (this is exactly the muscle a maintainer uses — reading unfamiliar code and extracting its contract).

  1. vllm/platforms/interface.py — The Platform base class — the contract every backend implements.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  2. vllm/platforms/cuda.py — The NVIDIA platform.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  3. vllm/platforms/cpu.py — The CPU platform — read this; you can run it on a laptop.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  4. vllm/platforms/__init__.py — Platform detection/resolution + plugin discovery.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  5. vllm/plugins/ — The plugin loading mechanism.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.

Questions to answer as you read

  • The Platform abstraction: device type, attention backend default, capabilities?
  • How the engine queries the platform instead of hardcoding CUDA?
  • The out-of-tree plugin system (entry points) for new hardware?
  • CPU backend: what changes (no paging kernels? threading? dtype support)?
  • Why some features are platform-gated (FP8, CUDA graphs, certain kernels)?

Cross-references