Skip to content
LessWrong AI · Communities

Neuralese is Actually Probably Good for Alignment

The best language models are still getting smarter and more capable. To an increasing degree, this is because they are trained by Reinforcement Learning with Verifiable Rewards. Chain of thought reasoning allows models to evade the finite depth restriction on information flow by passing (relatively little) information