Phase 16 — Mini-Build: extend mini_vllm
Contents
Your task
Put a tiny HTTP layer over mini_vllm (stdlib http.server is fine) exposing a /v1/completions-shaped endpoint with streaming, plus a toy tool-call parser that extracts a JSON tool call from the output.
Why build it (and not just read it)
Reading the real kernel/feature tells you what production does. Re-implementing a tiny version tells you why every decision was made — which is the understanding that survives into an interview or a 2 a.m. incident. Keep it small; keep it tested.
Method
- Look at the matching real code from 01-deep-dive.md.
- Add your module under
mini_vllm/(or extend an existing one). - Write a
test_*.pynext to it that pins the behavior you care about. - Run
pytest mini_vllm -qand keep it green.
Definition of done
- Your component runs on CPU with no extra dependencies (numpy ok).
- A test demonstrates the property this phase is about (not just "it runs").
- You can explain, out loud, how your toy maps to the real implementation and where it intentionally simplifies.
The flagship phases ship complete
mini_vllmmodules + tests (mini_vllm/block_pool.py,mini_vllm/scheduler.py) — use them as your reference for structure and test style.