Phase 17 — Interview Questions: Hardware Backends & Plugins
Staff/principal-level questions on this topic. Cover the answer, attempt it OUT LOUD, then compare. (See CAREER.md for how to run a full mock loop.)
Q1. How does vLLM support so many hardware backends without forking the engine?
Model answer
A Platform abstraction centralizes hardware-specific choices (device, default attention backend, supported dtypes, capabilities), and the engine queries it instead of hardcoding CUDA. New hardware can register out-of-tree via the plugin entry-point system, so vendors add support without modifying core code.
Q2. Why can you run vLLM on a CPU at all, and what's different?
Model answer
The CPU platform provides CPU-appropriate kernels and disables GPU-only features (certain fused/quant kernels, CUDA graphs). It's slower but lets you develop and test the engine logic — exactly what the [CPU-OK] labs in this course rely on.
Going deeper
The flagship phases (02, 03) show the depth and number of questions to expect for a topic you claim as your specialty.