Phase 17 — Hardware Backends & Plugins
← Phase 16 · Course home · Phase 18 →
Contents
- Don't Panic
- Why this phase matters
- What you'll learn
- The map: where this lives in the real code
- Labs in this phase
- How to work this phase
- Where you are
Don't Panic
vLLM runs on NVIDIA, AMD, CPUs, TPUs, Gaudi, and more. It does this by hiding every hardware difference behind a Platform abstraction and a plugin system, so the engine code stays hardware-agnostic and new accelerators arrive as plugins. This phase is that abstraction — and you'll run the CPU backend with no GPU at all.
Why this phase matters
Hardware breadth is a strategic advantage (GPU supply, cost arbitrage) and the Platform abstraction is a clean piece of architecture worth studying. Knowing where the seams are lets you reason about porting and about why a feature is available on one backend but not another.
What you'll learn
- The Platform abstraction: device type, attention backend default, capabilities
- How the engine queries the platform instead of hardcoding CUDA
- The out-of-tree plugin system (entry points) for new hardware
- CPU backend: what changes (no paging kernels? threading? dtype support)
- Why some features are platform-gated (FP8, CUDA graphs, certain kernels)
The map: where this lives in the real code
Open these in upstream/ (pinned to v0.22.1 @ 0decac0, see
UPSTREAM_PIN.md). The deep-dive (01-deep-dive.md)
walks through the important ones line by line.
vllm/platforms/interface.py— The Platform base class — the contract every backend implements.vllm/platforms/cuda.py— The NVIDIA platform.vllm/platforms/cpu.py— The CPU platform — read this; you can run it on a laptop.vllm/platforms/__init__.py— Platform detection/resolution + plugin discovery.vllm/plugins/— The plugin loading mechanism.
Labs in this phase
- lab-01-platform-abstraction
[CPU-OK]— build the Platform interface, registry, resolver (CPU floor + loud override), then register an out-of-tree platform and change the engine's decisions with zero core edits — plus the duplicate-registration supply-chain guard. - lab-02-run-cpu-vllm
[CPU-OK]— run vLLM on laptop cores and read cpu.py against lab-01's interface: every override checked off, the Phase 1–3 engine untouched. Captured output included.
See labs/README.md for how to run them.
How to work this phase
- Read this guide for intuition.
- Read 01-deep-dive.md with the
upstream/files open. - Do 02-mini-build.md — build the
mini_vllmpiece yourself. - Run the labs, then attempt EXERCISES.md.
- Self-test with INTERVIEW.md; keep CHEATSHEET.md handy.
Where you are
This is one of the scaffolded phases: the guide, anchors, labs, exercises, and interview prompts are real and ready to study. The fully-worked, line-by-line treatment (with starter/ solution/test code in every lab) follows the gold-standard set by the flagship phases — Phase 02 · PagedAttention and Phase 03 · Continuous Batching. Use those two as the template for the depth to bring here.
← Phase 16 · Course home · Phase 18 →