Today's highlights include the introduction of **TruthRL**, a novel reinforcement learning framework that significantly reduces hallucinations in large language models (LLMs) from 43.5% to 19.4% by optimizing for truthfulness and appropriate abstention (). Additionally, the **QR-MAX** algorithm for model-based reinforcement learning addresses non-Markovian reward decision processes, achieving PAC convergence with polynomial sample complexity (). Furthermore, **Scone**, a unified understanding-generation model for subject-driven image generation, is reported to outperform existing models on multiple benchmarks (Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling). These advancements underscore the ongoing innovation in AI and LLMs, with practical implications for practitioners in the field.
ERAlign: Energy-based Representation Alignment of GNNs and LLMs on Text-attributed Graphs
The paper introduces ERAlign, an Energy-based Representation Alignment framework designed to enhance the integration of Graph Neural Networks (GNNs) and Large Language Models (LLMs) on Text-attributed Graphs (TAGs). By projecting GNN-encoded structures and LLM-derived text embeddings into a shared latent space and optimizing alignment through an Energy-based Model objective, ERAlign achieves superior representation consistency, demonstrated through state-of-the-art performance across eight TAG datasets with varying supervision levels. This approach addresses representation drift and improves generalization, making it a significant advancement for practitioners working on multi-modal learning tasks involving graphs and textual data.
arXiv cs.AI — 9 d agoResearch
2.
On the Condition Number Dependency in Bilevel Optimization
The paper presents new lower bounds on the oracle complexity for finding $\epsilon$-stationary points in bilevel optimization, particularly when the upper-level problem is nonconvex and the lower-level problem is strongly convex. It establishes a lower bound of $\Omega(\kappa_y^{5/2} \epsilon^{-2})$, which highlights a significant gap in condition number dependency between bilevel and minimax problems, and extends results to various settings including high-order smooth functions and stochastic oracles. This work is crucial for practitioners as it provides deeper insights into the complexity landscape of bilevel optimization, potentially guiding the design of more efficient algorithms in real-world applications.
arXiv cs.AI — 9 d agoTraining
3.
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
TruthRL is a novel reinforcement learning framework designed to enhance the truthfulness of large language models (LLMs) by optimizing for both accurate responses and appropriate abstention when uncertain. Implemented using Generalized Reward Policy Optimization (GRPO), TruthRL employs a ternary reward system that distinguishes between correct answers, hallucinations, and abstentions, leading to a significant reduction in hallucinations from 43.5% to 19.4% and an increase in truthfulness from 5.3% to 37.2% across four knowledge-intensive benchmarks. This approach is crucial for practitioners as it addresses the dual challenge of accuracy and uncertainty management in LLMs, enabling more reliable deployment in real-world applications.
arXiv cs.AI — 9 d agoSafety
the full briefing
Models & Releases
The introduction of **TruthRL** marks a significant advancement in the quest for more reliable LLMs, optimizing for truthfulness and reducing hallucinations effectively (). In reinforcement learning, the **QR-MAX** algorithm offers a novel approach to discrete non-Markovian reward decision processes, achieving PAC convergence with polynomial sample complexity, which is crucial for practitioners dealing with temporal dependencies in RL (). Additionally, **Scone** has been introduced as a unified model for subject-driven image generation, surpassing existing models on various benchmarks, which could enhance applications in visual tasks (Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling).
Research
The paper on **Lost in Serialization** provides insights into the invariance and generalization capabilities of LLM graph reasoners, revealing sensitivity to graph serialization variations (). The work on **V-REX** introduces a benchmarking suite for visual reasoning in vision-language models, crucial for improving interpretative abilities in real-world applications (V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions). Furthermore, the **Deep Generative Model for Human Mobility Behavior** presents a diffusion-based framework for simulating activity-travel sequences, which could inform urban planning (Deep Generative Model for Human Mobility Behavior).