Training
AdaSR: Adaptive Streaming Reasoning with Hierarchical Relative Policy Optimization
The paper introduces AdaSR, an adaptive streaming reasoning framework designed for dynamic input scenarios like audio and video streams, allowing models to reason during input processing and finalize decisions post-stream. It employs Hierarchical Relative Policy Optimization (HRPO) to optimize the reasoning process by separating it into streaming and deep reasoning phases, enhancing computational efficiency and accuracy while managing latency. AdaSR demonstrates improved performance over traditional supervised fine-tuning methods, making it a significant advancement for practitioners working with real-time reasoning in AI applications.
llmstreamingreasoning