LessWrong AI June 28, 2026 · Communities

Refusal Is Complicated As Hell: An Update

TL;DRIt would make sense to briefly skim through our previous post that introduces our experiments on refusal in LLMs. There we explain how it started, here we’ll tell how it’s going.The primary goal of this text is to try and structure the list of whack-a-mole research questions. The secondary goal is to get some outs

Read original