r/LocalLLaMA
· Communities
Gemma 4 26BA4B Surprisingly Usable at IQ3_S – Are small quants really this usable?
I've been experimenting with using lower quants of Gemma 4 26B on my M3 16gb MacBook Air. The Quant runs at a solid 25 tokens per second decoding and is really close to the bf16 for my use cases (No coding, tool calling). Do I have confirmation bias or are UD Q3 quants surprisingly good? Anyhow, huge props to the Unslo