Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 17 Labs — Hardware Backends & Plugins

Two labs on the layer that lets one engine speak to any silicon. The arc: build the platform interface, the registry, and the resolver — then register an out-of-tree platform and change the engine's decisions with zero core edits (lab-01); then run the realest possible demonstration — vLLM on your laptop's CPU, with cpu.py read against the interface you built (lab-02).

CPU labs follow the standard contract — starter.py (your work), solution.py (reference), test_lab.py (the spec); default runs the solution, LAB_IMPL=starter grades yours.

# Whole phase:
pytest phase-17-hardware-backends-and-plugins/labs -m "not gpu"

# Grade yourself:
LAB_IMPL=starter pytest phase-17-hardware-backends-and-plugins/labs/lab-01-platform-abstraction -q

Contents


Labs

lab-01-platform-abstraction [CPU-OK]

The funnel: a Platform interface answering every hardware question (attention backend, dtypes, graph support), a registry with a CPU floor and a loud override, and the test that is the architecture — an out-of-tree "vendor" platform changes the engine's decisions without touching core. Plus the supply-chain guard: duplicate registration refused. Skills: the registry trilogy completed (attention → models → platforms); capability negotiation over assumption; plugins as additive hardware support; tests as architecture proofs.

lab-02-run-cpu-vllm [CPU-OK]

vLLM on laptop cores: the platform resolver choosing the floor, Torch SDPA standing in for flash attention, KV carved from RAM by VLLM_CPU_KVCACHE_SPACE, graphs degrading to eager — and the whole Phase 1–3 engine running unmodified, because none of it was ever a GPU concept. Read cpu.py against lab-01 and check off every override; note what a backend doesn't have to implement. Captured run included (your tok/s will differ; nothing else will). Skills: knob translation across platforms; the CPU roofline pricing the 9 tok/s; what to ask a vendor pitching "vLLM support."

What you can do after this phase

Explain how one engine serves five silicon families; evaluate or review a hardware plugin by what it overrides and what it leaves alone; run and tune vLLM where there is no GPU at all; and place any hardware question ("does X support fp8? graphs? custom all-reduce?") at the platform boundary where its answer lives. Phase 18 measures what all these layers cost; Phase 19 sends you upstream.