Phase 18 — Mini-Build: extend mini_vllm
Contents
Your task
Add a metrics collector to mini_vllm (tokens/step, batch size, KV usage, preemptions) and a tiny benchmark that sweeps max_num_batched_tokens to find the throughput knee — the real tuning loop in miniature.
Why build it (and not just read it)
Reading the real kernel/feature tells you what production does. Re-implementing a tiny version tells you why every decision was made — which is the understanding that survives into an interview or a 2 a.m. incident. Keep it small; keep it tested.
Method
- Look at the matching real code from 01-deep-dive.md.
- Add your module under
mini_vllm/(or extend an existing one). - Write a
test_*.pynext to it that pins the behavior you care about. - Run
pytest mini_vllm -qand keep it green.
Definition of done
- Your component runs on CPU with no extra dependencies (numpy ok).
- A test demonstrates the property this phase is about (not just "it runs").
- You can explain, out loud, how your toy maps to the real implementation and where it intentionally simplifies.
The flagship phases ship complete
mini_vllmmodules + tests (mini_vllm/block_pool.py,mini_vllm/scheduler.py) — use them as your reference for structure and test style.