arXiv cs.CL
· Papers
SEAD: Competence-Aware On-Policy Distillation via Entropy-Guided Supervision
arXiv:2606.28562v1 Announce Type: new Abstract: On-policy distillation (OPD) has a property absent in offline distillation and RL: teacher supervision quality depends on student competence. Incoherent rollouts yield noisy gradients; already-mastered tokens yield redundant ones. This creates waste at three scales (token