arXiv cs.CL
· Papers
Representational Depth of Evaluation Awareness Shifts With Scale in Open-Weight Language Models
arXiv:2606.29196v1 Announce Type: cross Abstract: Do language models know when they are being tested? This question matters for AI safety: a model that recognises an evaluation context could alter its behaviour strategically, making downstream benchmarks harder to interpret. Using 11 models spanning Qwen 2.5, Gemma 2,