Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 17 — Hardware Backends & Plugins

Phase 16 · Course home · Phase 18

Contents


Don't Panic

vLLM runs on NVIDIA, AMD, CPUs, TPUs, Gaudi, and more. It does this by hiding every hardware difference behind a Platform abstraction and a plugin system, so the engine code stays hardware-agnostic and new accelerators arrive as plugins. This phase is that abstraction — and you'll run the CPU backend with no GPU at all.

Why this phase matters

Hardware breadth is a strategic advantage (GPU supply, cost arbitrage) and the Platform abstraction is a clean piece of architecture worth studying. Knowing where the seams are lets you reason about porting and about why a feature is available on one backend but not another.

What you'll learn

  • The Platform abstraction: device type, attention backend default, capabilities
  • How the engine queries the platform instead of hardcoding CUDA
  • The out-of-tree plugin system (entry points) for new hardware
  • CPU backend: what changes (no paging kernels? threading? dtype support)
  • Why some features are platform-gated (FP8, CUDA graphs, certain kernels)

The map: where this lives in the real code

Open these in upstream/ (pinned to v0.22.1 @ 0decac0, see UPSTREAM_PIN.md). The deep-dive (01-deep-dive.md) walks through the important ones line by line.

Labs in this phase

  • lab-01-platform-abstraction [CPU-OK] — build the Platform interface, registry, resolver (CPU floor + loud override), then register an out-of-tree platform and change the engine's decisions with zero core edits — plus the duplicate-registration supply-chain guard.
  • lab-02-run-cpu-vllm [CPU-OK] — run vLLM on laptop cores and read cpu.py against lab-01's interface: every override checked off, the Phase 1–3 engine untouched. Captured output included.

See labs/README.md for how to run them.

How to work this phase

  1. Read this guide for intuition.
  2. Read 01-deep-dive.md with the upstream/ files open.
  3. Do 02-mini-build.md — build the mini_vllm piece yourself.
  4. Run the labs, then attempt EXERCISES.md.
  5. Self-test with INTERVIEW.md; keep CHEATSHEET.md handy.

Where you are

This is one of the scaffolded phases: the guide, anchors, labs, exercises, and interview prompts are real and ready to study. The fully-worked, line-by-line treatment (with starter/ solution/test code in every lab) follows the gold-standard set by the flagship phases — Phase 02 · PagedAttention and Phase 03 · Continuous Batching. Use those two as the template for the depth to bring here.

Phase 16 · Course home · Phase 18