Lab 19-02 — The Mock Staff Loop `[CPU-OK]`

Eighteen phases of INTERVIEW.md files exist for this moment: a full, timed, self-administered staff-engineer loop — four sessions, graded against the model answers and CAREER.md's competency map, with the gaps feeding a revision list rather than an ego. The deliverable is two artifacts: your scored competency matrix and the one-pager from the design session. This is the course's exit exam, and you are both candidate and (the harder job) honest grader.

Why this lab exists
The loop format
Session guide
Grading honestly
Hitchhiker's notes
Going further
References

Why this lab exists

Knowledge you can't produce under time pressure, out of order, against follow-up questions, isn't yet yours — it's still the book's. Staff loops test exactly the transformations this course optimized for: derive rather than recall (the economics labs), name the invariant under the feature (every lab's tables), state the trade with both sides priced (every Hitchhiker's note). The mock loop is where you find which of those moves are reflexes now and which still need the page open. Run it honestly once and your revision list writes itself; run it honestly twice, a month apart, and you'll have the rare commodity of calibrated confidence going into real loops — or real design meetings, which are the same exam with stakes.

The loop format

Four sessions, strictly timed, one sitting if you can manage it (fatigue realism included), notes only AFTER each session ends:

Session	Time	Source material
1. Fundamentals rapid-fire	30 min	2 questions each from phases 0–3 INTERVIEW.md, randomized
2. Systems deep-dive	45 min	1 question each from phases 4–8, with self-posed follow-ups
3. Design: "serve X under SLO Y on hardware Z"	60 min	Construct from phases 10/15/18 (3 scenarios below)
4. Debugging scenario	30 min	Pick 2 from the symptom catalog below

Design scenarios (pick one): (a) 70B chat, p99 TTFT < 1 s / ITL < 30 ms, 16 A100s across 2 nodes; (b) 100-tenant fine-tune platform, 8B base, 8 GPUs; (c) agentic workload, 8B + heavy tool calling, single-stream-latency-obsessed, 4 H100s. Produce the one-pager: topology (TP/PP/replicas/disagg), knobs with values and reasons, capacity arithmetic shown, the two biggest risks named.

Debugging symptoms (pick two; talk through the diagnosis tree out loud): p99 ITL spikes hourly (Phase 3/18); throughput fell 30% after a model swap (Phase 4/6 — check the backend line); tenant 7 complains, dashboards green (Phase 11 — slot thrash); seeded requests not reproducing (Phase 9); outputs differ across TP sizes (Phase 10 — the last ulp); VLM TTFT doubled (Phase 13 — image sizes).

Session guide

Answer out loud or in writing — producing is the test; reading silently grades as zero. For each question, the staff-grade answer has three layers, and you should consciously hit all three: the mechanism (what happens), the invariant or arithmetic underneath (why it must be so — quote the formula, name the I-number), and the operational consequence (what you'd do about it at 3 a.m.). The model answers in the INTERVIEW.md files are written in roughly that shape; grade against the shape, not just the facts.

Grading honestly

Score each competency row from CAREER.md's map: 3 = derived it cold, follow-ups survived; 2 = got there with hesitation or one peek; 1 = knew of it; 0 = blank. The two honesty rules: a peek caps the row at 2 (that's what the peek means), and an answer that skipped the arithmetic when arithmetic existed caps at 2 (staff answers compute — the course's whole thesis). Rows at ≤1 map directly to phases; that's your revision list, and the labs are designed for exactly this kind of targeted re-entry (each phase's index lists its skills).

Hitchhiker's notes

The design session is the one that decides real loops — and its failure mode is breadth without commitment. Force yourself to choose (TP=4 PP=2, not "TP or maybe PP") and defend with the lab arithmetic (Phase 10 lab-03's comm bill, Phase 0 lab-02's KV budget, Phase 15 lab-03's toll). Reviewers — real and self — reward a defended wrong choice over an undefended hedge.
Say the numbers out loud. "128 KiB per token, so 2048-token contexts cost 256 MiB each, so 8 GiB of free HBM holds ~32 of them" is a staff sentence; "KV is big" is not. The course gave you maybe twenty such derivations — sessions 1 and 3 should each surface five.
Interviewing the interviewer: after each model-answer comparison, ask what follow-up the answer invites and answer that too. Real loops live in the follow-ups; the INTERVIEW.md files seed them deliberately.
A month later, rerun changed rows only. Spaced, targeted, calibrated — the same discipline as performance work (measure, change one thing, measure).

Going further

Trade loops with a colleague — grading someone else against the model answers teaches more than being graded, and explaining a phase you "know" is the final filter for whether you do.
Take the design one-pager from session 3 and cost it on real cloud prices — the startup half of CAREER.md begins exactly there (capacity arithmetic × dollars = the unit economics every inference company lives or dies by).
Publish your best answer (blog post, internal doc) — the act of writing for strangers finds the remaining gaps, and the artifact compounds the way merged PRs do.

References

The INTERVIEW.md in every phase directory — the question bank.
CAREER.md — the competency map you're scoring against, and the maintainer/staff/startup paths the scores feed.
Lab-01 — the other half of the capstone: the loop proves you can explain the engine; the merged PR proves you can change it. Exit with both.

vLLM Mastery — From Zero to Maintainer