arXiv cs.AI
· Papers
ATOD: Annealed Turn-aware On-policy Distillation for Multi-turn Autonomous Agents
arXiv:2606.27814v1 Announce Type: new Abstract: Training small language-model agents for long-horizon interactive tasks requires both fast imitation and reward-driven improvement. On-policy distillation (OPD) provides dense teacher guidance and typically improves rapidly in the early stage, but its gains saturate once