Setup
This course is designed so that the majority of labs run on a laptop CPU. You only need
a GPU for the labs explicitly tagged [GPU-REQ] (and even those ship captured output so you
can learn without one).
Contents
- 1. Python environment
- 2. Get the real vLLM source (required for the deep-dives)
- 3. (Optional) Install the real engine for the GPU labs
- 4. Running the labs and tests
- 5. Cheap GPU access for the
[GPU-REQ]labs - 6. Models used in labs
- Troubleshooting
1. Python environment
We follow vLLM's own convention and use uv (fast, and
it's what upstream uses — see upstream/AGENTS.md). Plain venv works too.
# Install uv (one time)
curl -LsSf https://astral.sh/uv/install.sh | sh
# From the repo root:
uv venv --python 3.12
source .venv/bin/activate
# Install the CPU-only course dependencies (numpy + pytest). This is all you need
# for mini_vllm and every [CPU-OK] lab.
uv pip install -e .
To run the torch-based labs (some Phase 2/4 mini-builds), add the CPU build of torch:
uv pip install -e ".[torch]" # CPU wheels are fine; no CUDA needed for mini_vllm
2. Get the real vLLM source (required for the deep-dives)
Every 01-deep-dive.md cites upstream/... paths. Clone the pinned tree:
git clone --depth 1 --branch v0.22.1 \
https://github.com/vllm-project/vllm.git upstream
cd upstream && git rev-parse HEAD # 0decac0d96c42b49572498019f0a0e3600f50398
cd ..
You do not need to install vLLM to read its source. (upstream/ is gitignored.)
3. (Optional) Install the real engine for the GPU labs
The real vllm package needs a CUDA build of torch and an NVIDIA GPU. Install it only on
a GPU box:
uv pip install -e ".[vllm]" # vllm==0.22.1, matches the pin
4. Running the labs and tests
# All CPU tests (mini_vllm + flagship labs). Run this constantly.
pytest -m "not gpu"
# Just one phase's labs
pytest phase-02-paged-attention/labs
# The mini engine's own test suite
pytest mini_vllm
# On a GPU box, also run the GPU-tagged tests
pytest -m gpu
GPU tests are auto-skipped when no CUDA device is present (see the gpu_device fixture in
each phase's conftest.py), so pytest is always green on a laptop.
5. Cheap GPU access for the [GPU-REQ] labs
You do not need to own a GPU. Options, cheapest-effort first:
| Option | Notes |
|---|---|
| Google Colab (free/Pro) | Free T4 is enough for small-model vLLM labs. Easiest start. |
| Modal / RunPod / Lambda / Vast.ai | Per-second/per-hour A10/L4/A100 rentals. ~$0.4–$2/hr for the GPUs these labs use. |
Cloud spot instances (AWS g5, GCP g2) | Cheapest sustained; more setup. |
A T4 or L4 (16–24 GB) runs every GPU lab in this course with a small model
(e.g. facebook/opt-125m, Qwen/Qwen2.5-0.5B). You will never need an 80 GB card to learn.
6. Models used in labs
Labs default to tiny models so they download fast and fit small GPUs (and some run on
CPU): facebook/opt-125m, Qwen/Qwen2.5-0.5B-Instruct, TinyLlama/TinyLlama-1.1B. Each
lab README names the exact model and the huggingface-cli download command.
Troubleshooting
pytestcollects 0 tests → run from the repo root (sopyproject.tomlis found).import vllmfails on a laptop → expected; the real engine needs CUDA. Use the[CPU-OK]labs andmini_vllmon a laptop; the captured outputs cover the rest.- Line numbers in a deep-dive don't match → you're not at the pinned commit. Re-clone per step 2, or search for the named function instead of trusting the line number.