arXiv cs.LG
· Papers
Offline Reinforcement Learning of High-Quality Behaviors Under Robust Style Alignment
arXiv:2601.22823v2 Announce Type: replace Abstract: We study offline reinforcement learning of style-conditioned policies using explicit style supervision via subtrajectory labeling functions. In this setting, aligning style with high task performance is particularly challenging due to distribution shift and inherent c