TrainingarXiv cs.AI — 8 d ago

Learning What to Predict: Downstream-Guided Task Design for Continued Pretraining

The paper introduces V-pretraining, a method that optimizes continued pretraining by using a lightweight task designer to provide feedback on downstream performance without directly updating the learner's parameters. This approach employs adaptive top-K soft targets for language modeling and learned views for self-supervised vision, resulting in significant improvements in task-specific benchmarks, such as a +7.4 point gain in GSM8K Pass@1 for Qwen2.5-0.5B and enhanced transfer performance for DINOv3 in vision tasks. This method matters for practitioners as it enables more effective task design that enhances model capabilities while maintaining generalization, thereby streamlining the pretraining process.

pretrainingtaskself-supervisedrelevance 0.00 · engagement 0.00

Read at source ↗← all news