Phase 14 — Deep Dive: Model Architectures (Adding a Model)

Read this with upstream/ open. Every path is relative to upstream/ at the pinned commit v0.22.1 @ 0decac0 (UPSTREAM_PIN.md). If a line number ever drifts, search for the named symbol instead.

Guided reading list
Questions to answer as you read
Cross-references

Guided reading list

Work through these in order. This is a scaffold: the reading targets and the questions are real; fill in the line-by-line annotations as you go (this is exactly the muscle a maintainer uses — reading unfamiliar code and extracting its contract).

vllm/model_executor/models/llama.py — The reference decoder-only implementation.
- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/model_executor/models/registry.py — The architecture registry.
- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/model_executor/model_loader/ — Weight loading + checkpoint format handling.
- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/model_executor/models/mamba.py — A state-space (non-attention) model.
- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/model_executor/models/interfaces.py — Mixins: SupportsLoRA, SupportsPP, SupportsMultiModal, ...
- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
tests/models/ — How model correctness is tested upstream (logit/greedy equality).
- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.

Questions to answer as you read

The model contract: init(vllm_config), forward(input_ids, positions, ...) -> hidden?
vLLM building blocks: VocabParallelEmbedding, {Column,Row}ParallelLinear, Attention, RMSNorm?
Weight loading: load_weights + the name-remapping from HF checkpoints?
The model registry and how a name resolves to a class?
Families: decoder-only (Llama), MoE (Mixtral), hybrid/SSM (Mamba/Jamba), pooling/reward?
get_input_embeddings, tie_word_embeddings, LoRA/quant compatibility hooks?

Cross-references

Intuition: 00-guide.md
Build it yourself: 02-mini-build.md
The gold-standard depth to emulate: Phase 02 deep-dive.

vLLM Mastery — From Zero to Maintainer

Phase 14 — Deep Dive: Model Architectures (Adding a Model)

Contents

Guided reading list

Questions to answer as you read

Cross-references

Keyboard shortcuts

vLLM Mastery — From Zero to Maintainer

Phase 14 — Deep Dive: Model Architectures (Adding a Model)

Contents

Guided reading list

Questions to answer as you read

Cross-references