Phase 14 — Cheatsheet: Model Architectures (Adding a Model)
- Model recipe: parallel layers + Attention -> register -> load_weights remap -> test vs HF.
- Fused weights (QKV, gate_up) are the usual load_weights gotcha.
- Interfaces/mixins declare LoRA/PP/MultiModal/pooling support.
- Families: decoder-only, MoE, hybrid/SSM (Mamba), embedding/reward.
Key upstream files
vllm/model_executor/models/llama.pyvllm/model_executor/models/registry.pyvllm/model_executor/model_loader/vllm/model_executor/models/mamba.pyvllm/model_executor/models/interfaces.pytests/models/
Full reference: 00-guide.md · 01-deep-dive.md