TrainingarXiv cs.AI — 11 d ago

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

The paper introduces "vocabulary dropout," a technique applied to the logits of a language model during co-evolutionary self-play to maintain diversity in problem generation. By employing a hard and non-stationary mask, this method prevents the proposer model from converging to a narrow output distribution, thereby enhancing the curriculum's informativeness for the solver. Training on mathematical reasoning with models Qwen3-4B and Qwen3-8B, results showed an average improvement of +4.4 points on competition-level benchmarks, indicating that such explicit action-space constraints can significantly enhance co-evolutionary learning processes in language models.

curriculum_learningco-evolutiondiversityrelevance 0.00 · engagement 0.00

Read at source ↗← all news