Phase 17 — Deep Dive: Hardware Backends & Plugins
Read this with
upstream/open. Every path is relative toupstream/at the pinned commitv0.22.1 @ 0decac0(UPSTREAM_PIN.md). If a line number ever drifts, search for the named symbol instead.
Contents
Guided reading list
Work through these in order. This is a scaffold: the reading targets and the questions are real; fill in the line-by-line annotations as you go (this is exactly the muscle a maintainer uses — reading unfamiliar code and extracting its contract).
vllm/platforms/interface.py— The Platform base class — the contract every backend implements.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/platforms/cuda.py— The NVIDIA platform.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/platforms/cpu.py— The CPU platform — read this; you can run it on a laptop.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/platforms/__init__.py— Platform detection/resolution + plugin discovery.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/plugins/— The plugin loading mechanism.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
Questions to answer as you read
- The Platform abstraction: device type, attention backend default, capabilities?
- How the engine queries the platform instead of hardcoding CUDA?
- The out-of-tree plugin system (entry points) for new hardware?
- CPU backend: what changes (no paging kernels? threading? dtype support)?
- Why some features are platform-gated (FP8, CUDA graphs, certain kernels)?
Cross-references
- Intuition: 00-guide.md
- Build it yourself: 02-mini-build.md
- The gold-standard depth to emulate: Phase 02 deep-dive.