Daily digest — 2026-06-11

CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference

The paper introduces CLP (Collocation-Length Predictor), a novel approach for enhancing multi-token prediction (MTP) in large language models by mitigating head-backbone competition during autoregressive decoding. CLP employs a lightweight span-level decision layer with only 4.6K–7.7K parameters, achieving speedups of 1.20x–1.29x on 1.5B Qwen2.5 models and 1.14x–1.20x on 7B models without quality degradation (repetition ratio < 0.02), compared to prior gate-based methods that showed significant quality loss. This work provides a roadmap for improving MTP head prediction accuracy, critical for accelerating inference in large-scale models.

arXiv cs.AI — 47 d agoInference

Janus: A Benchmark for Goal-Conditioned Information Distortion in LLMs

The article announces the release of JANUS, a benchmark designed to evaluate goal-conditioned information distortion in large language models (LLMs). It consists of 160 scenarios across 8 domains, comparing neutral and goal-directed prompts using a fixed pool of factual information to assess how models distort facts. This benchmark is significant for practitioners as it highlights the vulnerability of LLMs to producing misleading outputs based on framing and incentives, underscoring the need for improved safeguards against such distortions in AI applications.

arXiv cs.AI — 47 d agoSafety

Dynamic Linear Attention

The paper introduces Dynamic Linear Attention (DLA), a framework designed to enhance multi-state linear attention mechanisms for Large Language Models (LLMs) by implementing Information-Aware Dynamic State Merging and Capacity-Bounded Memory Modeling. DLA adaptively merges states based on token importance, resulting in improved representation capacity for long contexts while maintaining a fixed-size memory cache. Experimental evaluations across 16 datasets show DLA outperforms existing state-of-the-art methods, making it a significant advancement for practitioners aiming to optimize LLM performance in long-context scenarios.

arXiv cs.AI — 47 d agoModels

The day in AI, distilled.

CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference

Janus: A Benchmark for Goal-Conditioned Information Distortion in LLMs

Dynamic Linear Attention

Models & Releases

Research & Safety

Practical Applications