Daily digest — 2026-06-28

Trace Only What You Need: Structure-Aware On-Demand Hypergraph Memory for Long-Document Question Answering

The paper introduces DocTrace, a multi-agent retrieval-augmented generation (RAG) framework designed for long-document question answering (QA). It features a lightweight document structural tree index and hypergraph-structured working memory that is query-triggered and experience-guided, addressing limitations in knowledge organization and reasoning reuse. Experimental results demonstrate that DocTrace outperforms the baseline model ComoRAG by up to 8.85% in F1 and 4.40% in EM across multiple datasets while achieving a 53.32% reduction in computational cost, making it a significant advancement for practitioners dealing with long-document QA tasks.

arXiv cs.CL — 19 d ago · found 17 d agoRAG

Pushing the Limits of LLM Tool Calling via Experiential Knowledge Integration and Activation

The paper introduces the Knowledge-Augmented Tool Execution (KATE) framework, which enhances the performance of large language models (LLMs) in tool use by integrating experiential knowledge and modifying inference strategies. Key findings include that expanding the width of reasoning through parallel sampling significantly activates latent knowledge, while post-training with knowledge-augmented data and reinforcement learning yields superior results compared to traditional supervised fine-tuning. Experiments on BFCL-V3 and AppWorld show substantial improvements over existing baselines, underscoring the importance of effective knowledge integration for practitioners developing autonomous AI agents.

arXiv cs.CL — 19 d ago · found 17 d agoAgents

ConvMemory v2: A Recall-Preserving Top-10 Evidence Reranker for Conversational Memory Retrieval

ConvMemory v2 has been introduced as a token-evidence reranker that refines the output of the ConvMemory v1 model by reordering its protected top-10 candidate set without altering the recall metrics. The model, based on a fine-tuned ms-marco-MiniLM-L-6-v2 cross-encoder with 22,713,601 parameters, demonstrates significant performance improvements on the LoCoMo conversational memory benchmark, achieving a FULL MRR of 0.6560 compared to v1's 0.5824, while maintaining identical Recall@10 and Hit@10 metrics. This development is crucial for practitioners as it showcases an effective method for enhancing retrieval quality in memory-based conversational systems without incurring the computational costs of more complex models.

arXiv cs.CL — 19 d ago · found 17 d agoRAG

The day in AI, distilled.

Trace Only What You Need: Structure-Aware On-Demand Hypergraph Memory for Long-Document Question Answering

Pushing the Limits of LLM Tool Calling via Experiential Knowledge Integration and Activation

ConvMemory v2: A Recall-Preserving Top-10 Evidence Reranker for Conversational Memory Retrieval

Models & Releases

Research

Safety & Security