Phase 01 — Mini-Build: trace the request lifecycle
You'll add lifecycle tracing to mini_vllm so you can see a request move through
WAITING → RUNNING → FINISHED, with its num_computed_tokens/num_tokens at every step. Seeing
the state machine run is how the architecture stops being abstract.
Contents
The task (lab-01)
Implement trace_request(engine_kwargs, prompt, sampling_params) -> list[Event] that runs the
mini_vllm engine one step() at a time and records, after each step, every live request's
(request_id, status, num_computed_tokens, num_tokens). Then derive:
- the first event (should be RUNNING with
num_computed == num_prompt_tokensafter prefill), - the sequence of statuses (RUNNING…→FINISHED),
- that
num_computed_tokensis monotonically non-decreasing until finish.
You're reconstructing, on your own engine, what VLLM_LOGGING_LEVEL=DEBUG shows you on the real
one (lab-02). Map each transition to EngineCore.step (core.py:428).
Method
mini_vllm.LLMEngine exposes scheduler (with .running/.waiting) and step(). Drive the
loop manually:
eng = LLMEngine(**engine_kwargs)
rid = eng.add_request(prompt, sampling_params)
events = []
while eng.scheduler.has_unfinished_requests():
eng.step()
for r in eng.scheduler.running:
events.append(Event(r.request_id, r.status.name, r.num_computed_tokens, r.num_tokens))
# also capture finished requests in the step return value
(The exact capture is the lab's job; the test pins the resulting trace's shape.)
Definition of done
pytest phase-01-architecture-and-request-lifecycle/labs -q
Then answer: at which step does num_computed_tokens first equal num_prompt_tokens (prefill
done)? After that, how much does it grow per step (decode = 1)? Why does that match the
prefill/decode model from Phase 0?
Map to the real engine
| your trace | real vLLM |
|---|---|
| status transitions | RequestStatus (request.py:315) |
| per-step counter advance | update_from_output (scheduler.py:1283) |
| the loop you drive | EngineCore.step (core.py:428) |
reading scheduler.running | the real Scheduler.running list |