Training
How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs
FlowTracer is a novel reinforcement learning framework designed to enhance token-level credit assignment in large language models (LLMs) by tracing answer-targeted reasoning flows on an attention-induced directed acyclic graph. This approach utilizes aggregated attention weights to assign credit based on global information propagation, allowing for the identification of critical tokens that influence reasoning outcomes. By improving the precision of learning signals, FlowTracer enables LLMs to achieve consistent performance gains across various reasoning tasks, making it a significant advancement for practitioners focused on optimizing RL in LLMs.
reinforcement learningllmreasoning