X · @teortaxesTex
· X / Twitter
faster algorithmic progress could have made things much weirder still can
faster algorithmic progress could have made things much weirderstill canHugh Zhang: A question I’ve been pondering: what if we'd known about o1 / RL on chain-of-thought back in the early days of LLMs?It turns out SFT + a bit of RL on GPT-2 almost matches the performance of a fine-tuned GPT-3 (12b) on GSM8K — a model wi