Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 12 Labs — Structured Outputs

Three labs that turn "please respond with JSON" into a mathematical guarantee. The arc: build the regex→FSM→token-mask pipeline and its adversarial proof (lab-01), cross the regular/context-free boundary with a pushdown machine for JSON — and get caught by the fuzz oracle on a grammar corner (lab-03), then measure the industrialized version (xgrammar via guided_json) forcing 50/50 schema validity on real hardware (lab-02).

Recommended order: 01 → 03 → 02. CPU labs follow the standard contract — starter.py (your work), solution.py (reference), test_lab.py (the spec); default runs the solution, LAB_IMPL=starter grades yours.

# Whole phase (GPU tests auto-skip without CUDA):
pytest phase-12-structured-outputs/labs -m "not gpu"

# Grade yourself on one lab:
LAB_IMPL=starter pytest phase-12-structured-outputs/labs/lab-01-regex-fsm-mask -q

Contents


Labs

lab-01-regex-fsm-mask [CPU-OK]

The three moves of constrained decoding: compile a pattern to a char-level FSM, lift it to token masks (a token is allowed iff its characters keep the machine alive — the outlines insight, including multi-char tokens crossing atom boundaries), and gate EOS on accepting states. Proven against an adversarial model that prefers garbage and emits valid hex anyway — plus the honest truncation-caveat test (prefix-valid ≠ complete). Skills: masks edit support, not mood; the compile-time/runtime split; char→token lifting; the max_tokens trap.

lab-02-json-schema-constrained [GPU-OPT]

The verification protocol on real vLLM: one schema, 50 prompts, two arms, a strict jsonschema validator — baseline 31/50 (mostly JSON wrapped in chat), guided 50/50. Plus the operational signatures: +210 ms first-request grammar compile, and the finish_reason: "length" truncation trap sprung deliberately. Annotated capture included. Skills: control-arm benchmarking; the four guided formats; user-supplied schemas as an operational risk surface.

lab-03-json-pushdown [CPU-OK]

Why regex isn't enough: JSON nests, nesting needs a stack, and you'll build the pushdown machine (modes + depth) whose mask is stack-aware — a brace-hating model still emits parseable JSON at depth 8. Featuring the lab's best war story: the json.loads fuzz oracle caught the reference implementation accepting 0123 (JSON forbids leading zeros) — grammar bugs need independent oracles. Skills: the regular/CFG boundary as a product boundary; resume-the-parent via the stack; oracle-driven grammar debugging; checkpointable machines for spec-decode composition.

What you can do after this phase

Explain precisely why constrained decoding guarantees validity (and the two ways it still doesn't: truncation, and bugs in the grammar itself); choose between regex/choice/schema/grammar constraints by their compile cost and expressive need; operate structured-output services with eyes open (grammar cache hit rates, first-request latency, finish_reason hygiene, user-schema risk); and read vllm/v1/structured_output/ as the industrial form of two machines you built by hand. The masks ride Phase 9's processor hook; the per-request grammar state joins Phase 9 lab-04's isolation discipline; and Phase 16's tool-calling parsers consume what these masks guarantee.