Skip to content
r/LocalLLaMA · Communities

Slow performance Unsloth Gemma 12B Q8

I recently replaced GPT-OSS 20B Q4 with Gemma 4 12B Q8 but i went from roughly 70 t/s to 10 t/s. Am I doing something wrong? In the current session I am trying a Q5 modell with no change in performance meassured against the Q8. [Service] Type=simple User=root WorkingDirectory=/root/llama.cpp ExecStart=/root/llama.cpp/b