Skip to content
LessWrong AI · Communities

Should we combine protocols for AI Control Research?

In AI control[1] research, we've developed many protocols: trusted monitoring, untrusted monitoring, resampling, etc. They each have a different safety-usefulness tradeoff, and labs might use a combination of them. The reason: they might want the maximum usefulness from their AI models at a minimum acceptable safety (e