Daily digest — 2026-07-02

How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

FlowTracer is a novel reinforcement learning framework designed to enhance token-level credit assignment in large language models (LLMs) by tracing answer-targeted reasoning flows on an attention-induced directed acyclic graph. This approach utilizes aggregated attention weights to assign credit based on global information propagation, allowing for the identification of critical tokens that influence reasoning outcomes. By improving the precision of learning signals, FlowTracer enables LLMs to achieve consistent performance gains across various reasoning tasks, making it a significant advancement for practitioners focused on optimizing RL in LLMs.

arXiv cs.CL — 23 d ago · found 21 d agoTraining

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

The paper introduces Representation-Aware Advantage Estimation (RAAE), specifically through the Graph-based Advantage Estimation (GraphAE) technique, which utilizes hidden states from reward models (RMs) to enhance advantage estimation in reinforcement learning from human feedback (RLHF). By modeling sampled groups as graphs, where nodes represent responses and edges indicate similarity in RM hidden space, GraphAE enables contextual information propagation, leading to improved performance. Empirical results show significant gains across multiple benchmarks, suggesting that integrating RM representations can enhance sample efficiency and robustness in RLHF applications.

arXiv cs.CL — 23 d ago · found 21 d agoTraining

SpenseGPT: Practical One-shot Pruning Enabling Sparse and Dense GEMMs for LLM Inference

The article introduces SpenseGPT, a one-shot post-training pruning method that utilizes a hybrid sparse-dense format, enabling efficient use of semi-structured 2:4 sparsity in weight matrices. It achieves up to 1.2x end-to-end decoding speedup on Qwen3-32B and Seed-OSS-36B models on B200 GPUs with FP8 precision, while maintaining accuracy. This approach is significant for practitioners as it provides a practical solution for optimizing LLM inference without requiring specialized compiler support or sacrificing model performance.

arXiv cs.CL — 23 d ago · found 21 d agoInference

The day in AI, distilled.

How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

SpenseGPT: Practical One-shot Pruning Enabling Sparse and Dense GEMMs for LLM Inference

Models & Releases

Research & Training

Safety & Security