Skip to content
r/LocalLLaMA · Communities

I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system.

G'day. This is part 3 on my Local LLM adventures. I have a crazy system hacked server-to-desktop system: Component Spec GPUs 2x Hopper H100, 96 GB HBM3 each CPUs 2x Grace, 72 cores each Host memory 480 GB LPDDR5X per Grace, 960 GB total So I can run technically run GLM5.2. Except the naive settings were crap, like 2.5