Skip to content
Alignment Forum · Communities

Predicting LLM Safety Before Release by Simulating Deployment

Paper linkBefore releasing a new model, labs need to understand not just what it can do, but how it is likely to behave in real-world use, including where it might introduce new risks. This becomes even more important as capabilities increase. As part of our pre-deployment safety review, we leverage targeted evaluation