Phase 12 Labs — Structured Outputs
Three labs that turn "please respond with JSON" into a mathematical guarantee. The
arc: build the regex→FSM→token-mask pipeline and its adversarial proof (lab-01),
cross the regular/context-free boundary with a pushdown machine for JSON — and get
caught by the fuzz oracle on a grammar corner (lab-03), then measure the
industrialized version (xgrammar via guided_json) forcing 50/50 schema validity on
real hardware (lab-02).
Recommended order: 01 → 03 → 02. CPU labs follow the standard contract —
starter.py (your work), solution.py (reference), test_lab.py (the spec); default
runs the solution, LAB_IMPL=starter grades yours.
# Whole phase (GPU tests auto-skip without CUDA):
pytest phase-12-structured-outputs/labs -m "not gpu"
# Grade yourself on one lab:
LAB_IMPL=starter pytest phase-12-structured-outputs/labs/lab-01-regex-fsm-mask -q
Contents
- lab-01-regex-fsm-mask
[CPU-OK] - lab-02-json-schema-constrained
[GPU-OPT] - lab-03-json-pushdown
[CPU-OK] - What you can do after this phase
Labs
lab-01-regex-fsm-mask [CPU-OK]
The three moves of constrained decoding: compile a pattern to a char-level FSM, lift it to token masks (a token is allowed iff its characters keep the machine alive — the outlines insight, including multi-char tokens crossing atom boundaries), and gate EOS on accepting states. Proven against an adversarial model that prefers garbage and emits valid hex anyway — plus the honest truncation-caveat test (prefix-valid ≠ complete). Skills: masks edit support, not mood; the compile-time/runtime split; char→token lifting; the max_tokens trap.
lab-02-json-schema-constrained [GPU-OPT]
The verification protocol on real vLLM: one schema, 50 prompts, two arms, a strict
jsonschema validator — baseline 31/50 (mostly JSON wrapped in chat), guided 50/50.
Plus the operational signatures: +210 ms first-request grammar compile, and the
finish_reason: "length" truncation trap sprung deliberately. Annotated capture
included. Skills: control-arm benchmarking; the four guided formats; user-supplied
schemas as an operational risk surface.
lab-03-json-pushdown [CPU-OK]
Why regex isn't enough: JSON nests, nesting needs a stack, and you'll build the
pushdown machine (modes + depth) whose mask is stack-aware — a brace-hating model
still emits parseable JSON at depth 8. Featuring the lab's best war story: the
json.loads fuzz oracle caught the reference implementation accepting 0123 (JSON
forbids leading zeros) — grammar bugs need independent oracles. Skills: the
regular/CFG boundary as a product boundary; resume-the-parent via the stack;
oracle-driven grammar debugging; checkpointable machines for spec-decode
composition.
What you can do after this phase
Explain precisely why constrained decoding guarantees validity (and the two ways it
still doesn't: truncation, and bugs in the grammar itself); choose between
regex/choice/schema/grammar constraints by their compile cost and expressive need;
operate structured-output services with eyes open (grammar cache hit rates,
first-request latency, finish_reason hygiene, user-schema risk); and read
vllm/v1/structured_output/ as the industrial form of two machines you built by
hand. The masks ride Phase 9's processor hook; the per-request grammar state joins
Phase 9 lab-04's isolation discipline; and Phase 16's tool-calling parsers consume
what these masks guarantee.