r/LocalLLaMA July 3, 2026 · Communities

GLM5.2 performance.

I was wondering how fast GLM5.2 (Nvidia’s 460GB nvfp4 checkpoint) is running on your rigs. I have it running at ~1tok/s in the simulation harness. The data extrapolates to 75tok/s on the real Cuda MGPU machine. So I would like to collect data from you how fast it runs for you. State your tok/s first so I can easily par

Read original