Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 01 — Mini-Build: trace the request lifecycle

You'll add lifecycle tracing to mini_vllm so you can see a request move through WAITING → RUNNING → FINISHED, with its num_computed_tokens/num_tokens at every step. Seeing the state machine run is how the architecture stops being abstract.

Contents


The task (lab-01)

Implement trace_request(engine_kwargs, prompt, sampling_params) -> list[Event] that runs the mini_vllm engine one step() at a time and records, after each step, every live request's (request_id, status, num_computed_tokens, num_tokens). Then derive:

  • the first event (should be RUNNING with num_computed == num_prompt_tokens after prefill),
  • the sequence of statuses (RUNNING…→FINISHED),
  • that num_computed_tokens is monotonically non-decreasing until finish.

You're reconstructing, on your own engine, what VLLM_LOGGING_LEVEL=DEBUG shows you on the real one (lab-02). Map each transition to EngineCore.step (core.py:428).

Method

mini_vllm.LLMEngine exposes scheduler (with .running/.waiting) and step(). Drive the loop manually:

eng = LLMEngine(**engine_kwargs)
rid = eng.add_request(prompt, sampling_params)
events = []
while eng.scheduler.has_unfinished_requests():
    eng.step()
    for r in eng.scheduler.running:
        events.append(Event(r.request_id, r.status.name, r.num_computed_tokens, r.num_tokens))
    # also capture finished requests in the step return value

(The exact capture is the lab's job; the test pins the resulting trace's shape.)

Definition of done

pytest phase-01-architecture-and-request-lifecycle/labs -q

Then answer: at which step does num_computed_tokens first equal num_prompt_tokens (prefill done)? After that, how much does it grow per step (decode = 1)? Why does that match the prefill/decode model from Phase 0?

Map to the real engine

your tracereal vLLM
status transitionsRequestStatus (request.py:315)
per-step counter advanceupdate_from_output (scheduler.py:1283)
the loop you driveEngineCore.step (core.py:428)
reading scheduler.runningthe real Scheduler.running list