Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 09 — Cheatsheet: Sampling & Decoding Algorithms

Contents


The one-liner

Logits → pick a token. The pipeline (penalties → temperature → top-k → top-p/min-p → sample) runs vectorized across a heterogeneous batch, every row with its own params.

The knobs

  • greedy = T=0 = argmax (deterministic)
  • temperature T: <1 sharper, >1 flatter
  • top-k: keep k highest; top-p: keep nucleus (cum prob ≥ p); min-p: keep prob ≥ min_p × max_prob
  • penalties: repetition/frequency (count) / presence (flat); logit bias; bad-words

Logits processors

The pluggable pre-sampling hook. One mechanism for penalties, bias, bad-words, AND grammar masks (Phase 12: illegal tokens → -inf). logits_processor/{interface,builtin,state}.py.

Batching

Per-request params packed into tensors (SamplingMetadata); masked branch-free ops apply each row's settings in one pass. No Python loop on the hot path.

n>1: one prefill, N samples share prompt KV (prefix caching), diverge after token 1 (parallel_sampling.py). Beam search: top-N partial seqs by cum log-prob; awkward in continuous batching (active set changes), handled specially.

Key upstream

  • v1/sample/sampler.py:20 Sampler · :67 forward · :223 apply_temperature · :238 sample
  • v1/sample/ops/topk_topp_sampler.py · ops/penalties.py · ops/bad_words.py
  • v1/sample/logits_processor/ · v1/sample/metadata.py · sampling_params.py:168

Full: 00-guide.md · 01-deep-dive.md · INTERVIEW.md