TrainingarXiv cs.AI — 8 d ago

Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling

The article presents Deep Dense Exploration (DDE), a novel reinforcement learning strategy for large language models, specifically implemented in the DEEP-GRPO framework. DDE introduces a lightweight utility function for identifying pivotal states, local dense resampling to enhance trajectory discovery, and a dual-stream optimization objective to separate global policy learning from local updates. Experimental results on mathematical reasoning benchmarks show that DEEP-GRPO outperforms existing methods like GRPO and tree-based approaches, addressing critical challenges in effective exploration within the vast natural language sequence space.

reinforcement learningllmexplorationrelevance 0.00 · engagement 0.00

Read at source ↗← all news