Phase 01 — Mini-Build: trace the request lifecycle

You'll add lifecycle tracing to mini_vllm so you can see a request move through WAITING → RUNNING → FINISHED, with its num_computed_tokens/num_tokens at every step. Seeing the state machine run is how the architecture stops being abstract.

The task (lab-01)
Method
Definition of done
Map to the real engine

The task (lab-01)

Implement trace_request(engine_kwargs, prompt, sampling_params) -> list[Event] that runs the mini_vllm engine one step() at a time and records, after each step, every live request's (request_id, status, num_computed_tokens, num_tokens). Then derive:

the first event (should be RUNNING with num_computed == num_prompt_tokens after prefill),
the sequence of statuses (RUNNING…→FINISHED),
that num_computed_tokens is monotonically non-decreasing until finish.

You're reconstructing, on your own engine, what VLLM_LOGGING_LEVEL=DEBUG shows you on the real one (lab-02). Map each transition to EngineCore.step (core.py:428).

Method

mini_vllm.LLMEngine exposes scheduler (with .running/.waiting) and step(). Drive the loop manually:

eng = LLMEngine(**engine_kwargs)
rid = eng.add_request(prompt, sampling_params)
events = []
while eng.scheduler.has_unfinished_requests():
    eng.step()
    for r in eng.scheduler.running:
        events.append(Event(r.request_id, r.status.name, r.num_computed_tokens, r.num_tokens))
    # also capture finished requests in the step return value

(The exact capture is the lab's job; the test pins the resulting trace's shape.)

Definition of done

pytest phase-01-architecture-and-request-lifecycle/labs -q

Then answer: at which step does num_computed_tokens first equal num_prompt_tokens (prefill done)? After that, how much does it grow per step (decode = 1)? Why does that match the prefill/decode model from Phase 0?

Map to the real engine

your trace	real vLLM
status transitions	`RequestStatus` (`request.py:315`)
per-step counter advance	`update_from_output` (`scheduler.py:1283`)
the loop you drive	`EngineCore.step` (`core.py:428`)
reading `scheduler.running`	the real `Scheduler.running` list

vLLM Mastery — From Zero to Maintainer

Phase 01 — Mini-Build: trace the request lifecycle

Contents

The task (lab-01)

Method

Definition of done

Map to the real engine

Keyboard shortcuts

vLLM Mastery — From Zero to Maintainer

Phase 01 — Mini-Build: trace the request lifecycle

Contents

The task (lab-01)

Method

Definition of done

Map to the real engine