Phase 14 — Exercises: Model Architectures (Adding a Model)

Work these after the labs. They escalate from "explain it" to "design it" — staff-level means you can do the last ones cold.

Map a HF attention block's qkv/o weights onto QKVParallelLinear/RowParallelLinear.
What must change to make a model support tensor parallelism correctly?
How would you add a pooling/reward head, and what changes in output handling?

Self-grading

For each: could you (a) explain it to a teammate in 2 minutes, and (b) point to the exact upstream/ file that proves your answer? If not, re-read the matching anchor in 01-deep-dive.md.

Keyboard shortcuts

vLLM Mastery — From Zero to Maintainer

Phase 14 — Exercises: Model Architectures (Adding a Model)

Self-grading