Phase 17 — Mini-Build: extend mini_vllm
Contents
Your task
Add a 'platform' abstraction to mini_vllm: a base class exposing device/dtype/default-backend, with a CPU implementation, and have the engine consult it instead of hardcoding — mirroring vLLM's Platform.
Why build it (and not just read it)
Reading the real kernel/feature tells you what production does. Re-implementing a tiny version tells you why every decision was made — which is the understanding that survives into an interview or a 2 a.m. incident. Keep it small; keep it tested.
Method
- Look at the matching real code from 01-deep-dive.md.
- Add your module under
mini_vllm/(or extend an existing one). - Write a
test_*.pynext to it that pins the behavior you care about. - Run
pytest mini_vllm -qand keep it green.
Definition of done
- Your component runs on CPU with no extra dependencies (numpy ok).
- A test demonstrates the property this phase is about (not just "it runs").
- You can explain, out loud, how your toy maps to the real implementation and where it intentionally simplifies.
The flagship phases ship complete
mini_vllmmodules + tests (mini_vllm/block_pool.py,mini_vllm/scheduler.py) — use them as your reference for structure and test style.