Skip to content
X · @teortaxesTex · X / Twitter

RT Zixuan Li: Spent some time with the PostTrainBench paper. https://arxiv.org/abs/2603.08640 Lots of thinking on environment design, how to detect ch…

RT Zixuan LiSpent some time with the PostTrainBench paper. https://arxiv.org/abs/2603.08640Lots of thinking on environment design, how to detect cheating, and what future evals should actually be measuring.Worth a read if you're following AI R&D automation.Hardik Bhatnagar: New #1 on PostTrainBench: GLM 5.2 (Max reason