Phase 17 — Interview Questions: Hardware Backends & Plugins

Staff/principal-level questions on this topic. Cover the answer, attempt it OUT LOUD, then compare. (See CAREER.md for how to run a full mock loop.)

Q1. How does vLLM support so many hardware backends without forking the engine?

Model answer

A Platform abstraction centralizes hardware-specific choices (device, default attention backend, supported dtypes, capabilities), and the engine queries it instead of hardcoding CUDA. New hardware can register out-of-tree via the plugin entry-point system, so vendors add support without modifying core code.

Q2. Why can you run vLLM on a CPU at all, and what's different?

Model answer

The CPU platform provides CPU-appropriate kernels and disables GPU-only features (certain fused/quant kernels, CUDA graphs). It's slower but lets you develop and test the engine logic — exactly what the [CPU-OK] labs in this course rely on.

Going deeper

The flagship phases (02, 03) show the depth and number of questions to expect for a topic you claim as your specialty.

Keyboard shortcuts

vLLM Mastery — From Zero to Maintainer

Phase 17 — Interview Questions: Hardware Backends & Plugins

Q1. How does vLLM support so many hardware backends without forking the engine?

Q2. Why can you run vLLM on a CPU at all, and what's different?

Going deeper