Setup

This course is designed so that the majority of labs run on a laptop CPU. You only need a GPU for the labs explicitly tagged [GPU-REQ] (and even those ship captured output so you can learn without one).

1. Python environment
2. Get the real vLLM source (required for the deep-dives)
3. (Optional) Install the real engine for the GPU labs
4. Running the labs and tests
5. Cheap GPU access for the [GPU-REQ] labs
6. Models used in labs
Troubleshooting

1. Python environment

We follow vLLM's own convention and use uv (fast, and it's what upstream uses — see upstream/AGENTS.md). Plain venv works too.

# Install uv (one time)
curl -LsSf https://astral.sh/uv/install.sh | sh

# From the repo root:
uv venv --python 3.12
source .venv/bin/activate

# Install the CPU-only course dependencies (numpy + pytest). This is all you need
# for mini_vllm and every [CPU-OK] lab.
uv pip install -e .

To run the torch-based labs (some Phase 2/4 mini-builds), add the CPU build of torch:

uv pip install -e ".[torch]"   # CPU wheels are fine; no CUDA needed for mini_vllm

2. Get the real vLLM source (required for the deep-dives)

Every 01-deep-dive.md cites upstream/... paths. Clone the pinned tree:

git clone --depth 1 --branch v0.22.1 \
  https://github.com/vllm-project/vllm.git upstream
cd upstream && git rev-parse HEAD   # 0decac0d96c42b49572498019f0a0e3600f50398
cd ..

You do not need to install vLLM to read its source. (upstream/ is gitignored.)

3. (Optional) Install the real engine for the GPU labs

The real vllm package needs a CUDA build of torch and an NVIDIA GPU. Install it only on a GPU box:

uv pip install -e ".[vllm]"   # vllm==0.22.1, matches the pin

4. Running the labs and tests

# All CPU tests (mini_vllm + flagship labs). Run this constantly.
pytest -m "not gpu"

# Just one phase's labs
pytest phase-02-paged-attention/labs

# The mini engine's own test suite
pytest mini_vllm

# On a GPU box, also run the GPU-tagged tests
pytest -m gpu

GPU tests are auto-skipped when no CUDA device is present (see the gpu_device fixture in each phase's conftest.py), so pytest is always green on a laptop.

5. Cheap GPU access for the `[GPU-REQ]` labs

You do not need to own a GPU. Options, cheapest-effort first:

Option	Notes
Google Colab (free/Pro)	Free T4 is enough for small-model vLLM labs. Easiest start.
Modal / RunPod / Lambda / Vast.ai	Per-second/per-hour A10/L4/A100 rentals. ~$0.4–$2/hr for the GPUs these labs use.
Cloud spot instances (AWS `g5`, GCP `g2`)	Cheapest sustained; more setup.

A T4 or L4 (16–24 GB) runs every GPU lab in this course with a small model (e.g. facebook/opt-125m, Qwen/Qwen2.5-0.5B). You will never need an 80 GB card to learn.

6. Models used in labs

Labs default to tiny models so they download fast and fit small GPUs (and some run on CPU): facebook/opt-125m, Qwen/Qwen2.5-0.5B-Instruct, TinyLlama/TinyLlama-1.1B. Each lab README names the exact model and the huggingface-cli download command.

Troubleshooting

pytest collects 0 tests → run from the repo root (so pyproject.toml is found).
import vllm fails on a laptop → expected; the real engine needs CUDA. Use the [CPU-OK] labs and mini_vllm on a laptop; the captured outputs cover the rest.
Line numbers in a deep-dive don't match → you're not at the pinned commit. Re-clone per step 2, or search for the named function instead of trusting the line number.

vLLM Mastery — From Zero to Maintainer