r/MachineLearning
· Communities
A debugger for RL reward functions that detects reward hacking during training [P]
While experimenting with GRPO training, I kept running this shit that when reward increases, it becomes difficult to tell whether the policy is genuinely improving or simply exploiting the reward function. So I built a small library called rewardspy that wraps an existing reward function and continuously monitors indic