Phase 14 — Deep Dive: Model Architectures (Adding a Model)
Read this with
upstream/open. Every path is relative toupstream/at the pinned commitv0.22.1 @ 0decac0(UPSTREAM_PIN.md). If a line number ever drifts, search for the named symbol instead.
Contents
Guided reading list
Work through these in order. This is a scaffold: the reading targets and the questions are real; fill in the line-by-line annotations as you go (this is exactly the muscle a maintainer uses — reading unfamiliar code and extracting its contract).
vllm/model_executor/models/llama.py— The reference decoder-only implementation.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/model_executor/models/registry.py— The architecture registry.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/model_executor/model_loader/— Weight loading + checkpoint format handling.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/model_executor/models/mamba.py— A state-space (non-attention) model.- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
vllm/model_executor/models/interfaces.py— Mixins: SupportsLoRA, SupportsPP, SupportsMultiModal, ...- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
tests/models/— How model correctness is tested upstream (logit/greedy equality).- Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
Questions to answer as you read
- The model contract: init(vllm_config), forward(input_ids, positions, ...) -> hidden?
- vLLM building blocks: VocabParallelEmbedding, {Column,Row}ParallelLinear, Attention, RMSNorm?
- Weight loading: load_weights + the name-remapping from HF checkpoints?
- The model registry and how a name resolves to a class?
- Families: decoder-only (Llama), MoE (Mixtral), hybrid/SSM (Mamba/Jamba), pooling/reward?
- get_input_embeddings, tie_word_embeddings, LoRA/quant compatibility hooks?
Cross-references
- Intuition: 00-guide.md
- Build it yourself: 02-mini-build.md
- The gold-standard depth to emulate: Phase 02 deep-dive.