Skip to content
X · @teortaxesTex · X / Twitter

RT Ant Ling: Great breakdown from Qian. In our recent UFP4 paper, we show that a uniform-grid FP4 recipe achieves lower BF16-relative loss degradation…

RT Ant LingGreat breakdown from Qian.In our recent UFP4 paper, we show that a uniform-grid FP4 recipe achieves lower BF16-relative loss degradation than strong E2M1 baselines across Dense 1.5B, MoE 7.9B, and MoE 124B long-run pretraining.Full paper: https://arxiv.org/abs/2606.20381Qian: Should FP4 training still defaul