Skip to content
r/LocalLLaMA · Communities

Does quantizing change the MTP draft rate?

Speculative decoding speeds up LLM generation by using a small "drafter" model to predict several tokens ahead of the main model. The main model then verifies these predictions in a single forward pass. If the main model is heavily quantized (low bit-rate), it becomes less "consistent" with the drafter, lowering the ac