Step-3.7-Flash (198B-A11B vision MoE) on 4×3090 — fully-resident IQ3_XXS beats thespilled IQ4 by 2.4×, and MTP speculative decode silently breaks vision
TLDR: Biggest quant that fits 100% in VRAM beats a higher quant that spills. And if your MoE ships an MTP draft head, the draft can't…