Skip to content
r/LocalLLaMA · Communities

Ornith-1.0-35B Q3_K_M: ~17 GB VRAM, KLD-checked against BF16

I quantized deepreinforce-ai/Ornith-1.0-35B down to Q3_K_M so it fits comfortably on a single GPU. Produced locally with llama-quantize from the upstream BF16 GGUF — the quantizer took it from 16.01 BPW down to 3.87 BPW, landing at 16.8 GB on disk / ~17 GiB loaded VRAM, about 21% smaller than Q4_K_M. It’s the smallest