Skip to content
arXiv cs.LG · Papers

Offline Reinforcement Learning of High-Quality Behaviors Under Robust Style Alignment

arXiv:2601.22823v2 Announce Type: replace Abstract: We study offline reinforcement learning of style-conditioned policies using explicit style supervision via subtrajectory labeling functions. In this setting, aligning style with high task performance is particularly challenging due to distribution shift and inherent c