Skip to content
LessWrong AI · Communities

Refusal Is Complicated As Hell: An Update

TL;DRIt would make sense to briefly skim through our previous post that introduces our experiments on refusal in LLMs. There we explain how it started, here we’ll tell how it’s going.The primary goal of this text is to try and structure the list of whack-a-mole research questions. The secondary goal is to get some outs