Skip to content
LessWrong AI · Communities

Eval-Awareness Steering detects the Test, Not the Sabotage

Produced as part of independent researchHuge thanks to Apollo Research (org) for open-sourcing the deception-detection harness which proved to be foundational in this work. Prior work by Devbunova (2026), the Apollo/Goldowsky-Dill probing line, and Tice et al. on noise injection shaped the design throughout.SummaryI te