Training
A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design
The paper introduces the Q-target framework for supervised fine-tuning (SFT), which reinterprets SFT as a target distribution design, allowing for a more nuanced approach to token alignment. The proposed Target-SFT method, which optimizes the training objective based on a desired target distribution, demonstrates improved performance across ten reasoning dataset-model settings. This research provides a foundational principle for SFT design, offering practitioners a broader range of strategies for optimizing fine-tuning objectives in LLMs.
fine-tuningtarget distributionsupervised learning