Skip to content
Ollama · Infrastructure

Faster Gemma 4 on MLX with multi-token prediction

Gemma 4 is now significantly faster in Ollama 0.31 on Apple Silicon via multi-token prediction (MTP), powered by MLX. Performance is now up to 90% faster when used with coding agents, as measured using the Aider polyglot benchmark.