X · @teortaxesTex
· X / Twitter
Just realized that this is GRPO-brained, generally ORM-brained dense process reward signal, in theory, would let you progress even if you do not have …
Just realized that this is GRPO-brained, generally ORM-braineddense process reward signal, in theory, would let you progress even if you do not have "positive trajectories". Of course, flaws of PRMs haven't disappeared eitherI still think PRMs were a psyopTeortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞): @stevehou "GLM already pr