Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 09 — Mini-Build: a sampling pipeline with logits processors

You already have mini_vllm/sampler.py (greedy, temperature, top-k, top-p). This phase adds the two things real engines need: min-p, a repetition penalty, and a logits-processor hook — the pluggable pre-sampling stage that penalties and structured output (Phase 12) ride.

Contents


The task (lab-01)

In lab-01-sampling-ops implement, in numpy:

  • apply_min_p(logits, min_p) — keep tokens with prob ≥ min_p × max_prob, mask the rest.
  • apply_repetition_penalty(logits, generated_token_ids, penalty) — divide (or subtract for) logits of already-generated tokens so repeats are less likely.
  • a LogitsProcessor protocol: a callable (logits, context) -> logits, and a Pipeline that runs a list of processors in order.
  • sample(logits, params, generated, processors) — apply processors → penalty → temperature → top-k → top-p/min-p → sample, in that order.

This mirrors Sampler.forward (sampler.py:67) and the LogitsProcessor framework (logits_processor/interface.py). The pipeline order is the contract.

Why a logits-processor hook (not just hardcoded knobs)?

Because the same mechanism serves penalties, logit bias, bad-words, and grammar masks (Phase 12). Build it once as "a function that edits logits at a defined point" and structured output becomes "just another processor that masks illegal tokens to -inf." You'll literally reuse this pipeline in Phase 12.

Definition of done

pytest phase-09-sampling-and-decoding-algorithms/labs -q

Tests pin: top-k restricts support to the argmax when k=1; top-p keeps the nucleus; min-p cutoff is confidence-relative; repetition penalty lowers a repeated token's probability; a banning logits processor makes a token unsamplable.

Map to the real engine

your numpyreal vLLM
pipeline orderSampler.forward (sampler.py:67)
apply_min_p, top-k/pops/topk_topp_sampler.py (vectorized over the batch)
repetition penaltyops/penalties.py
LogitsProcessor + Pipelinelogits_processor/{interface,builtin,state}.py
a banning processorops/bad_words.py + the grammar mask (Phase 12)