Phase 09 — Mini-Build: a sampling pipeline with logits processors
You already have mini_vllm/sampler.py (greedy, temperature, top-k, top-p). This phase adds the
two things real engines need: min-p, a repetition penalty, and a logits-processor hook
— the pluggable pre-sampling stage that penalties and structured output (Phase 12) ride.
Contents
- The task (lab-01)
- Why a logits-processor hook (not just hardcoded knobs)?
- Definition of done
- Map to the real engine
The task (lab-01)
In lab-01-sampling-ops implement, in numpy:
apply_min_p(logits, min_p)— keep tokens withprob ≥ min_p × max_prob, mask the rest.apply_repetition_penalty(logits, generated_token_ids, penalty)— divide (or subtract for) logits of already-generated tokens so repeats are less likely.- a
LogitsProcessorprotocol: a callable(logits, context) -> logits, and aPipelinethat runs a list of processors in order. sample(logits, params, generated, processors)— apply processors → penalty → temperature → top-k → top-p/min-p → sample, in that order.
This mirrors Sampler.forward (sampler.py:67) and the LogitsProcessor framework
(logits_processor/interface.py). The pipeline order is the contract.
Why a logits-processor hook (not just hardcoded knobs)?
Because the same mechanism serves penalties, logit bias, bad-words, and grammar masks
(Phase 12). Build it once as "a function that edits logits at a defined point" and structured
output becomes "just another processor that masks illegal tokens to -inf." You'll literally
reuse this pipeline in Phase 12.
Definition of done
pytest phase-09-sampling-and-decoding-algorithms/labs -q
Tests pin: top-k restricts support to the argmax when k=1; top-p keeps the nucleus; min-p cutoff is confidence-relative; repetition penalty lowers a repeated token's probability; a banning logits processor makes a token unsamplable.
Map to the real engine
| your numpy | real vLLM |
|---|---|
| pipeline order | Sampler.forward (sampler.py:67) |
apply_min_p, top-k/p | ops/topk_topp_sampler.py (vectorized over the batch) |
| repetition penalty | ops/penalties.py |
LogitsProcessor + Pipeline | logits_processor/{interface,builtin,state}.py |
| a banning processor | ops/bad_words.py + the grammar mask (Phase 12) |