Daily digest — 2026-07-05

Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing

Skill-RAG introduces a failure-aware framework for Retrieval-Augmented Generation (RAG) that integrates a hidden-state prober and a prompt-based skill router to address misalignment between queries and evidence. By diagnosing retrieval failures and selecting from four distinct skills—query rewriting, question decomposition, evidence focusing, and an exit skill—the model enhances retrieval efficiency and accuracy, particularly on challenging open-domain QA and reasoning benchmarks. This approach is significant for practitioners as it provides a structured method to improve LLM performance in scenarios where traditional retrieval mechanisms fail, enabling more robust handling of complex queries.

arXiv cs.CL — 26 d ago · found 24 d agoRAG

On Cost-Effective LLM-as-a-Judge Improvement Techniques

The paper presents four cost-effective techniques to enhance the accuracy of language model judges in reinforcement learning from human feedback (RLHF) frameworks: ensemble scoring, task-specific criteria injection, calibration context, and adaptive model escalation. Empirical results on RewardBench 2 demonstrate that ensemble scoring combined with criteria injection achieves an accuracy of 85.8%, a 13.5 percentage point improvement over the baseline, with small models benefiting significantly from these methods. This work is significant for practitioners as it provides scalable strategies for improving LLM evaluation reliability without incurring substantial costs, making high-accuracy assessments more accessible.

arXiv cs.CL — 26 d ago · found 24 d agoTraining

HarDBench: A Benchmark for Draft-Based Co-Authoring Jailbreak Attacks for Safe Human-LLM Collaborative Writing

The paper introduces HarDBench, a benchmark specifically designed to evaluate the vulnerability of large language models (LLMs) to draft-based co-authoring jailbreak attacks, where malicious users exploit incomplete drafts to elicit harmful outputs. It covers high-risk domains such as Explosives, Drugs, Weapons, and Cyberattacks, utilizing prompts with realistic structures to assess model susceptibility. The authors propose a safety-utility balanced alignment approach that significantly reduces harmful outputs while maintaining co-authoring performance, highlighting the need for robust evaluation frameworks in human-LLM collaborative writing.

arXiv cs.CL — 26 d ago · found 24 d agoSafety

The day in AI, distilled.

Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing

On Cost-Effective LLM-as-a-Judge Improvement Techniques

HarDBench: A Benchmark for Draft-Based Co-Authoring Jailbreak Attacks for Safe Human-LLM Collaborative Writing

Models & Releases

Research & Safety

Tooling & Open Source