Phase 15 — Mini-Build: extend `mini_vllm`

Your task
Why build it (and not just read it)
Method
Definition of done

Your task

Model disaggregation in mini_vllm: run a 'prefill engine' that produces KV blocks, serialize the block table + (fake) KV, and hand it to a separate 'decode engine' that continues generation — proving the handoff preserves output.

Why build it (and not just read it)

Reading the real kernel/feature tells you what production does. Re-implementing a tiny version tells you why every decision was made — which is the understanding that survives into an interview or a 2 a.m. incident. Keep it small; keep it tested.

Method

Look at the matching real code from 01-deep-dive.md.
Add your module under mini_vllm/ (or extend an existing one).
Write a test_*.py next to it that pins the behavior you care about.
Run pytest mini_vllm -q and keep it green.

Definition of done

Your component runs on CPU with no extra dependencies (numpy ok).
A test demonstrates the property this phase is about (not just "it runs").
You can explain, out loud, how your toy maps to the real implementation and where it intentionally simplifies.

The flagship phases ship complete mini_vllm modules + tests (mini_vllm/block_pool.py, mini_vllm/scheduler.py) — use them as your reference for structure and test style.

vLLM Mastery — From Zero to Maintainer

Phase 15 — Mini-Build: extend `mini_vllm`

Contents

Your task

Why build it (and not just read it)

Method

Definition of done

Keyboard shortcuts

vLLM Mastery — From Zero to Maintainer

Phase 15 — Mini-Build: extend mini_vllm

Contents

Your task

Why build it (and not just read it)

Method

Definition of done

Phase 15 — Mini-Build: extend `mini_vllm`