arXiv cs.AI June 24, 2026 · Papers

Breaking the Mirror: Activation-Based Mitigation of Self-Preference in LLM Evaluators

arXiv:2509.03647v2 Announce Type: replace-cross Abstract: Large language models (LLMs) increasingly serve as automated evaluators, yet they suffer from "self-preference bias": a tendency to favor their own outputs over those of other models. This bias undermines fairness and reliability in evaluation pipelines, particu

Read original