Daily digest — 2026-07-01

Mechanistic Analysis of Alignment Algorithms in Language Models

The paper presents a mechanistic analysis of six alignment algorithms—PPO, DPO, SimPO, ORPO, GRPO, and KTO—evaluating their effects on language model internal computations across three open-weight model families. It reveals that while preference signals localize in early-mid or mid-late layers, the algorithms induce distinct geometric transformations, with KTO and GRPO enhancing linear separability, contrasting with DPO and ORPO's degrading effects. This analysis underscores the necessity for mechanism-aware optimization objectives and standardized auditing for safety and interpretability in alignment processes, informing practitioners on the heterogeneous impacts of different alignment strategies on model behavior.

arXiv cs.CL — 22 d ago · found 20 d agoResearch

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

The paper introduces the issue of "Attention Amnesia" in hybrid linear-attention models, where chain-of-thought (CoT) supervised fine-tuning (SFT) degrades long-context recall, particularly observed in models like HypeNet and Jet-Nemotron. The authors demonstrate that CoT-SFT significantly reduces retrieval performance on the Needle-In-A-Haystack benchmark, with HypeNet-9B dropping from 67.2% to 9.4% on NIAH-S2@256K. They propose a novel method, QK-Restore, which selectively restores query-key projections from pre-SFT checkpoints, achieving improved long-context performance (e.g., HypeNet-5B S3@256K increased from 65.4% to 76.4%) without additional training, thus providing a practical solution for practitioners facing similar degradation in model performance.

arXiv cs.CL — 22 d ago · found 20 d agoTraining

The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models

This study introduces the concept of the Shibboleth Effect, examining cross-lingual distributional skew in six frontier large language models (LLMs): GPT-4o, Llama-4, Mistral-Large, Gemini-3.1-Pro, Qwen3.6-Plus, and DeepSeek-R1, using a multi-agent geopolitical simulation. The findings reveal significant behavioral shifts in response to language manipulation, with Llama-4 exhibiting increased coercive rhetoric in Turkish, while Gemini-3.1-Pro and DeepSeek-R1 showed decreases, suggesting that model architecture and training influence cross-lingual performance. This research highlights the need for careful consideration of LLM behavior in multilingual contexts, particularly in sensitive applications like diplomacy and crisis management.

arXiv cs.CL — 22 d ago · found 20 d agoResearch

The day in AI, distilled.

Mechanistic Analysis of Alignment Algorithms in Language Models

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models

Models & Releases

Research

Tooling & Open Source