Skip to content
arXiv cs.CL · Papers

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

arXiv:2606.24133v1 Announce Type: cross Abstract: The composition of training data, governed by the diversity of sources and their mixing strategy, is a cornerstone of Large Language Model (LLM) pre-training. Online Data Mixing (ODM), the technique of adaptively adjusting data mixtures during training, has emerged as a