Skip to content
arXiv cs.CL · Papers

SEAD: Competence-Aware On-Policy Distillation via Entropy-Guided Supervision

arXiv:2606.28562v1 Announce Type: new Abstract: On-policy distillation (OPD) has a property absent in offline distillation and RL: teacher supervision quality depends on student competence. Incoherent rollouts yield noisy gradients; already-mastered tokens yield redundant ones. This creates waste at three scales (token