arXiv cs.AI June 26, 2026 · Papers

GEOALIGN: Geometric Rollout Curation for Robust LLM Reinforcement Learning

arXiv:2606.26917v1 Announce Type: cross Abstract: Online reinforcement learning is widely used to align large language models (LLMs) with reward signals, yet training can be unstable under noisy or misspecified rewards. We identify a failure mode we call directional inconsistency: within a batch, a small set of high-re

Read original