r/MachineLearning
· Communities
The verifier based vs verifier free test time scaling result is older than people act, and it keeps getting confirmed [D]
The Setlur et al result that scaling test time compute without verification or RL is provably suboptimal keeps showing up in my reading and I think it deserves more weight than the "yet another scaling paper" treatment it got. The core claim is that verifier based methods, RL or search guided by a verifier, dominate ve