Skip to content
r/MachineLearning · Communities

A debugger for RL reward functions that detects reward hacking during training [P]

While experimenting with GRPO training, I kept running this shit that when reward increases, it becomes difficult to tell whether the policy is genuinely improving or simply exploiting the reward function. So I built a small library called rewardspy that wraps an existing reward function and continuously monitors indic