Phase 18 — Mini-Build: extend `mini_vllm`

Your task
Why build it (and not just read it)
Method
Definition of done

Your task

Add a metrics collector to mini_vllm (tokens/step, batch size, KV usage, preemptions) and a tiny benchmark that sweeps max_num_batched_tokens to find the throughput knee — the real tuning loop in miniature.

Why build it (and not just read it)

Reading the real kernel/feature tells you what production does. Re-implementing a tiny version tells you why every decision was made — which is the understanding that survives into an interview or a 2 a.m. incident. Keep it small; keep it tested.

Method

Look at the matching real code from 01-deep-dive.md.
Add your module under mini_vllm/ (or extend an existing one).
Write a test_*.py next to it that pins the behavior you care about.
Run pytest mini_vllm -q and keep it green.

Definition of done

Your component runs on CPU with no extra dependencies (numpy ok).
A test demonstrates the property this phase is about (not just "it runs").
You can explain, out loud, how your toy maps to the real implementation and where it intentionally simplifies.

The flagship phases ship complete mini_vllm modules + tests (mini_vllm/block_pool.py, mini_vllm/scheduler.py) — use them as your reference for structure and test style.

vLLM Mastery — From Zero to Maintainer

Phase 18 — Mini-Build: extend `mini_vllm`

Contents

Your task

Why build it (and not just read it)

Method

Definition of done

Keyboard shortcuts

vLLM Mastery — From Zero to Maintainer

Phase 18 — Mini-Build: extend mini_vllm

Contents

Your task

Why build it (and not just read it)

Method

Definition of done

Phase 18 — Mini-Build: extend `mini_vllm`