Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 16 — Mini-Build: extend mini_vllm

Contents


Your task

Put a tiny HTTP layer over mini_vllm (stdlib http.server is fine) exposing a /v1/completions-shaped endpoint with streaming, plus a toy tool-call parser that extracts a JSON tool call from the output.

Why build it (and not just read it)

Reading the real kernel/feature tells you what production does. Re-implementing a tiny version tells you why every decision was made — which is the understanding that survives into an interview or a 2 a.m. incident. Keep it small; keep it tested.

Method

  1. Look at the matching real code from 01-deep-dive.md.
  2. Add your module under mini_vllm/ (or extend an existing one).
  3. Write a test_*.py next to it that pins the behavior you care about.
  4. Run pytest mini_vllm -q and keep it green.

Definition of done

  • Your component runs on CPU with no extra dependencies (numpy ok).
  • A test demonstrates the property this phase is about (not just "it runs").
  • You can explain, out loud, how your toy maps to the real implementation and where it intentionally simplifies.

The flagship phases ship complete mini_vllm modules + tests (mini_vllm/block_pool.py, mini_vllm/scheduler.py) — use them as your reference for structure and test style.