TrainingarXiv cs.AI — 21 h ago

A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design

The paper introduces the Q-target framework for supervised fine-tuning (SFT), which reinterprets SFT as a target distribution design, allowing for a more nuanced approach to token alignment. The proposed Target-SFT method, which optimizes the training objective based on a desired target distribution, demonstrates improved performance across ten reasoning dataset-model settings. This research provides a foundational principle for SFT design, offering practitioners a broader range of strategies for optimizing fine-tuning objectives in LLMs.

fine-tuningtarget distributionsupervised learningrelevance 0.00 · engagement 0.00

Read at source ↗← all news