X · @jeremyphoward
· X / Twitter
RT hallerite: GLM5.2 brings back the critic. It was just a matter of time until we people would realize that group-based variance reduction is unfeasi…
RT halleriteGLM5.2 brings back the critic.It was just a matter of time until we people would realize that group-based variance reduction is unfeasible after some horizon length. We need to be more fine-grained. I am sure OAI and Ant have been using value models for quite some time.