Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 06 — Mini-Build: a per-channel int8 fake-quant linear

You'll build the smallest real quantization: store a weight matrix in int8 with per-channel scales, dequantize in the matmul, and measure the two things that matter — memory saved and round-trip error. This is exactly what create_weights + apply do for a real method, minus the GPU kernel.

Contents


The task (lab-01)

Implement, in numpy:

  • quantize_per_channel(W)(q_int8, scales) where W is (out, in); one scale per output channel (row). scale[o] = max(abs(W[o])) / 127; q_int8[o] = round(W[o] / scale[o]) clipped to [-127, 127].
  • dequantize(q_int8, scales)W_approx (scales[:,None] * q_int8).
  • quant_linear(x, q_int8, scales)x @ dequantize(...).T (the "apply" path).
  • memory_bytes(W) vs memory_bytes_quant(q_int8, scales) to show the saving.

Then in tests:

  • round-trip error ||W - dequant(quant(W))|| is small relative to ||W||,
  • per-channel beats per-tensor on a matrix with one large-magnitude row (outlier channel),
  • int8 storage is ~ smaller than fp32 (1 byte vs 4, plus a few scale floats),
  • quant_linear(x, ...)x @ W.T within tolerance.

Why per-channel beats per-tensor (the key insight)

One channel with large weights forces a huge per-tensor scale, crushing the resolution of all the small channels. A per-channel scale gives each row its own dynamic range. You'll measure this — it's the reason real methods are at least per-channel, and 4-bit methods go per-group.

Definition of done

pytest phase-06-quantization/labs -q

Map to the real engine

your numpyreal vLLM
quantize_per_channel (offline)how a checkpoint was quantized (GPTQ/AWQ/ModelOpt)
create_weights (store q + scales)Fp8LinearMethod.create_weights (fp8.py:316)
quant_linear (dequant + matmul)LinearMethodBase.apply (fp8.py:437) → a GEMM kernel (Phase 7)
per-channel vs per-tensorper-tensor/channel/group scale choices in real configs