Phase 16 — Mini-Build: extend `mini_vllm`

Your task
Why build it (and not just read it)
Method
Definition of done

Your task

Put a tiny HTTP layer over mini_vllm (stdlib http.server is fine) exposing a /v1/completions-shaped endpoint with streaming, plus a toy tool-call parser that extracts a JSON tool call from the output.

Why build it (and not just read it)

Reading the real kernel/feature tells you what production does. Re-implementing a tiny version tells you why every decision was made — which is the understanding that survives into an interview or a 2 a.m. incident. Keep it small; keep it tested.

Method

Look at the matching real code from 01-deep-dive.md.
Add your module under mini_vllm/ (or extend an existing one).
Write a test_*.py next to it that pins the behavior you care about.
Run pytest mini_vllm -q and keep it green.

Definition of done

Your component runs on CPU with no extra dependencies (numpy ok).
A test demonstrates the property this phase is about (not just "it runs").
You can explain, out loud, how your toy maps to the real implementation and where it intentionally simplifies.

The flagship phases ship complete mini_vllm modules + tests (mini_vllm/block_pool.py, mini_vllm/scheduler.py) — use them as your reference for structure and test style.

vLLM Mastery — From Zero to Maintainer

Phase 16 — Mini-Build: extend `mini_vllm`

Contents

Your task

Why build it (and not just read it)

Method

Definition of done

Keyboard shortcuts

vLLM Mastery — From Zero to Maintainer

Phase 16 — Mini-Build: extend mini_vllm

Contents

Your task

Why build it (and not just read it)

Method

Definition of done

Phase 16 — Mini-Build: extend `mini_vllm`