Alignment Forum June 11, 2026 · Communities

Models May Behave Worse When Eval Aware

This is the first in a series of research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas.TL;DRIt's often assumed that models will act more aligned when they can tell they're being evaluated. But we find that Gemini can take “undesired” actions in behavioura

Read original