arXiv cs.AI
· Papers
Breaking the Mirror: Activation-Based Mitigation of Self-Preference in LLM Evaluators
arXiv:2509.03647v2 Announce Type: replace-cross Abstract: Large language models (LLMs) increasingly serve as automated evaluators, yet they suffer from "self-preference bias": a tendency to favor their own outputs over those of other models. This bias undermines fairness and reliability in evaluation pipelines, particu