arXiv cs.LG June 29, 2026 · Papers

PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration

arXiv:2606.27578v1 Announce Type: new Abstract: Reward models for Reinforcement Learning from Human Feedback (RLHF) pool preferences across thousands of annotators and fit one global affine calibrator, collapsing raters with systematically different rating-scale offsets and slopes into a single average-rater fit that d

Read original