StarOR: Synergizing Tree Search and Test-Time Reinforcement Learning for Optimization Modeling
StarOR is a new framework that integrates Monte Carlo Tree Search (MCTS) with Test-Time Reinforcement Learning for optimization modeling, addressing the limitations of traditional methods that struggle with adaptability and error propagation. It decomposes the modeling process into four stages and employs a transient LoRA adapter updated via Gradient-based Reinforcement Policy Optimization (GRPO) at non-terminal nodes, utilizing MCTS-generated siblings for instance-specific policy refinement. Experimental results demonstrate that StarOR, using a 4 billion parameter backbone, achieves state-of-the-art performance across five optimization benchmarks, making it a significant advancement for practitioners in AI who require robust and adaptable optimization strategies.