Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 17 — Interview Questions: Hardware Backends & Plugins

Staff/principal-level questions on this topic. Cover the answer, attempt it OUT LOUD, then compare. (See CAREER.md for how to run a full mock loop.)

Q1. How does vLLM support so many hardware backends without forking the engine?

Model answer

A Platform abstraction centralizes hardware-specific choices (device, default attention backend, supported dtypes, capabilities), and the engine queries it instead of hardcoding CUDA. New hardware can register out-of-tree via the plugin entry-point system, so vendors add support without modifying core code.

Q2. Why can you run vLLM on a CPU at all, and what's different?

Model answer

The CPU platform provides CPU-appropriate kernels and disables GPU-only features (certain fused/quant kernels, CUDA graphs). It's slower but lets you develop and test the engine logic — exactly what the [CPU-OK] labs in this course rely on.

Going deeper

The flagship phases (02, 03) show the depth and number of questions to expect for a topic you claim as your specialty.