Skip to content
LessWrong AI · Communities

When capabilities work is the *safe* bet

If you believe that LLMs lend themselves unusually well to alignment compared to other regimes, this can be a very good reason to start doing capability research on them rather than LLM safety research. Imagine you have these beliefs about how AI goes: mjx-container[jax="CHTML"] { line-height: 0; } mjx-container [space