arXiv cs.AI
· Papers
GEOALIGN: Geometric Rollout Curation for Robust LLM Reinforcement Learning
arXiv:2606.26917v1 Announce Type: cross Abstract: Online reinforcement learning is widely used to align large language models (LLMs) with reward signals, yet training can be unstable under noisy or misspecified rewards. We identify a failure mode we call directional inconsistency: within a batch, a small set of high-re