Skip to content
X · @teortaxesTex · X / Twitter

faster algorithmic progress could have made things much weirder still can

faster algorithmic progress could have made things much weirderstill canHugh Zhang: A question I’ve been pondering: what if we'd known about o1 / RL on chain-of-thought back in the early days of LLMs?It turns out SFT + a bit of RL on GPT-2 almost matches the performance of a fine-tuned GPT-3 (12b) on GSM8K — a model wi