Phase 09 — Mini-Build: a sampling pipeline with logits processors

You already have mini_vllm/sampler.py (greedy, temperature, top-k, top-p). This phase adds the two things real engines need: min-p, a repetition penalty, and a logits-processor hook — the pluggable pre-sampling stage that penalties and structured output (Phase 12) ride.

The task (lab-01)
Why a logits-processor hook (not just hardcoded knobs)?
Definition of done
Map to the real engine

The task (lab-01)

In lab-01-sampling-ops implement, in numpy:

apply_min_p(logits, min_p) — keep tokens with prob ≥ min_p × max_prob, mask the rest.
apply_repetition_penalty(logits, generated_token_ids, penalty) — divide (or subtract for) logits of already-generated tokens so repeats are less likely.
a LogitsProcessor protocol: a callable (logits, context) -> logits, and a Pipeline that runs a list of processors in order.
sample(logits, params, generated, processors) — apply processors → penalty → temperature → top-k → top-p/min-p → sample, in that order.

This mirrors Sampler.forward (sampler.py:67) and the LogitsProcessor framework (logits_processor/interface.py). The pipeline order is the contract.

Why a logits-processor hook (not just hardcoded knobs)?

Because the same mechanism serves penalties, logit bias, bad-words, and grammar masks (Phase 12). Build it once as "a function that edits logits at a defined point" and structured output becomes "just another processor that masks illegal tokens to -inf." You'll literally reuse this pipeline in Phase 12.

Definition of done

pytest phase-09-sampling-and-decoding-algorithms/labs -q

Tests pin: top-k restricts support to the argmax when k=1; top-p keeps the nucleus; min-p cutoff is confidence-relative; repetition penalty lowers a repeated token's probability; a banning logits processor makes a token unsamplable.

Map to the real engine

your numpy	real vLLM
pipeline order	`Sampler.forward` (`sampler.py:67`)
`apply_min_p`, top-k/p	`ops/topk_topp_sampler.py` (vectorized over the batch)
repetition penalty	`ops/penalties.py`
`LogitsProcessor` + `Pipeline`	`logits_processor/{interface,builtin,state}.py`
a banning processor	`ops/bad_words.py` + the grammar mask (Phase 12)

vLLM Mastery — From Zero to Maintainer

Phase 09 — Mini-Build: a sampling pipeline with logits processors

Contents

The task (lab-01)

Why a logits-processor hook (not just hardcoded knobs)?

Definition of done

Map to the real engine

Keyboard shortcuts

vLLM Mastery — From Zero to Maintainer

Phase 09 — Mini-Build: a sampling pipeline with logits processors

Contents

The task (lab-01)

Why a logits-processor hook (not just hardcoded knobs)?

Definition of done

Map to the real engine