Skip to content
r/MachineLearning · Communities

The verifier based vs verifier free test time scaling result is older than people act, and it keeps getting confirmed [D]

The Setlur et al result that scaling test time compute without verification or RL is provably suboptimal keeps showing up in my reading and I think it deserves more weight than the "yet another scaling paper" treatment it got. The core claim is that verifier based methods, RL or search guided by a verifier, dominate ve