Skip to content
LessWrong AI · Communities

Can weak AI watch strong AI?

The more capabilities new frontier models gain, the more sharply the question arises how will we know when the model is doing something it shouldn't? Today, when models write texts and generate 10,000 lines of code at a time, we can't be sure there's no malicious segment in that code. Every action of the model requires