Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 14 — Deep Dive: Model Architectures (Adding a Model)

Read this with upstream/ open. Every path is relative to upstream/ at the pinned commit v0.22.1 @ 0decac0 (UPSTREAM_PIN.md). If a line number ever drifts, search for the named symbol instead.

Contents


Guided reading list

Work through these in order. This is a scaffold: the reading targets and the questions are real; fill in the line-by-line annotations as you go (this is exactly the muscle a maintainer uses — reading unfamiliar code and extracting its contract).

  1. vllm/model_executor/models/llama.py — The reference decoder-only implementation.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  2. vllm/model_executor/models/registry.py — The architecture registry.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  3. vllm/model_executor/model_loader/ — Weight loading + checkpoint format handling.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  4. vllm/model_executor/models/mamba.py — A state-space (non-attention) model.
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  5. vllm/model_executor/models/interfaces.py — Mixins: SupportsLoRA, SupportsPP, SupportsMultiModal, ...
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.
  6. tests/models/ — How model correctness is tested upstream (logit/greedy equality).
    • Read it, then write 3 sentences in your lab notebook: what data structure, what invariant, what edge case.

Questions to answer as you read

  • The model contract: init(vllm_config), forward(input_ids, positions, ...) -> hidden?
  • vLLM building blocks: VocabParallelEmbedding, {Column,Row}ParallelLinear, Attention, RMSNorm?
  • Weight loading: load_weights + the name-remapping from HF checkpoints?
  • The model registry and how a name resolves to a class?
  • Families: decoder-only (Llama), MoE (Mixtral), hybrid/SSM (Mamba/Jamba), pooling/reward?
  • get_input_embeddings, tie_word_embeddings, LoRA/quant compatibility hooks?

Cross-references