Training
Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution
The paper introduces "vocabulary dropout," a technique applied to the logits of a language model during co-evolutionary self-play to maintain diversity in problem generation. By employing a hard and non-stationary mask, this method prevents the proposer model from converging to a narrow output distribution, thereby enhancing the curriculum's informativeness for the solver. Training on mathematical reasoning with models Qwen3-4B and Qwen3-8B, results showed an average improvement of +4.4 points on competition-level benchmarks, indicating that such explicit action-space constraints can significantly enhance co-evolutionary learning processes in language models.
curriculum_learningco-evolutiondiversity