The verifier based vs verifier free test time scaling result is older than people act, and it keeps getting confirmed [D]
The Setlur et al result that scaling test time compute without verification or RL is provably suboptimal keeps showing up in my reading and I think…