arXiv cs.CL June 30, 2026 · Papers

Representational Depth of Evaluation Awareness Shifts With Scale in Open-Weight Language Models

arXiv:2606.29196v1 Announce Type: cross Abstract: Do language models know when they are being tested? This question matters for AI safety: a model that recognises an evaluation context could alter its behaviour strategically, making downstream benchmarks harder to interpret. Using 11 models spanning Qwen 2.5, Gemma 2,

Read original