Skip to content
r/LocalLLaMA · Communities

High-quality GLM-5.2 Quant on 4x DGX Spark – Guide, Results, and Comps

I got GLM-5.2 NVFP4 running on four DGX Sparks at 128K context. This is still a niche/hacky setup, but it is now a real serving point rather than just a proof of life. Objective: A high quality 4-bit quant running on 4x spark. Model: https://huggingface.co/Mapika/GLM-5.2-NVFP4 TL;DR: 128k context at fp8_ds_mla, ~15-16