Today's highlights include the introduction of **AgentPLM**, a novel protein language model that enhances protein sequence design through real-time consultation of external biophysical feedback, achieving state-of-the-art results in antibody optimization (). Another significant development is **Parthenon**, a self-evolving legal-agent framework that improves the performance of legal-domain large language models by incorporating a learning loop for continuous improvement (). Additionally, research on **Adaptive Teacher Exposure for Self-Distillation** reveals a new approach to optimize teacher model exposure during self-distillation, leading to improved reasoning in large language models (). These advancements underscore the ongoing evolution in LLM capabilities and applications across diverse fields.
the top three that day
1.
AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design
AgentPLM has been introduced as a novel protein language model that integrates Reasoning-Augmented Decoding (RAD) and Contrastive Agent Policy Optimisation (CAPO) to enhance protein sequence design by allowing real-time consultation of external biophysical feedback. It outperforms existing passive models on benchmark tasks, achieving state-of-the-art results in antibody optimization and other applications, demonstrating improved hit rates and online error correction capabilities. This advancement is significant for practitioners as it enables more adaptive and efficient protein design processes, potentially leading to better therapeutic candidates.
arXiv cs.AI — 9 d agoAgents
2.
Parthenon Law: A Self-Evolving Legal-Agent Framework
The article introduces \textsc{Parthenon}, a self-evolving legal-agent framework designed to enhance the performance of legal-domain large language models (LLMs) by addressing key challenges in the deployment of legal agents. It features a large-scale empirical study with $12,510$ agent trajectories demonstrating that while model accuracy improves with stronger models, matter completion remains inadequate. The framework incorporates a learning loop that allows agents to refine their skills and knowledge based on past performance, facilitating continuous improvement without altering model weights, which is crucial for practitioners aiming to build reliable legal AI systems.
arXiv cs.AI — 9 d agoAgents
3.
Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning
The paper introduces Adaptive Teacher Exposure for Self-Distillation (ATESD), a novel approach that optimizes the exposure of a teacher model during on-policy self-distillation to enhance reasoning in large language models (LLMs). ATESD utilizes a learnable Beta-policy controller to dynamically adjust the teacher's exposure to reference reasoning, leading to improved performance on benchmarks AIME 24, AIME 25, and HMMT 25 with Qwen3 models (1.7B, 4B, and 8B parameters), achieving significant gains over existing self-distillation and reinforcement learning methods. This work highlights the importance of adaptive exposure strategies in training LLMs, providing practitioners with a new mechanism to fine-tune model training and improve reasoning capabilities.
arXiv cs.AI — 9 d agoTraining
the full briefing
Models & Releases
The introduction of **AgentPLM** marks a significant advancement in protein sequence design, utilizing Reasoning-Augmented Decoding (RAD) and Contrastive Agent Policy Optimisation (CAPO) to achieve state-of-the-art results in antibody optimization (). Meanwhile, **Parthenon** presents a self-evolving legal-agent framework that enhances legal-domain LLMs through a learning loop, demonstrating improved performance in legal tasks ().
Training & Optimization
The paper on **Adaptive Teacher Exposure for Self-Distillation** introduces a novel method for optimizing teacher model exposure during self-distillation, leading to improved reasoning capabilities in LLMs (). Another notable contribution is the **AMEL** framework, which highlights the impact of prior conversation history on LLM judgments, revealing biases that can affect evaluations ().
Safety & Security
Research on **Agentic Misalignment** addresses the challenges of multi-agent systems in automated workflows, proposing a new alignment paradigm to enhance collaboration among agents (). Additionally, the **PhantomBench** benchmark evaluates hallucination rates in language models, emphasizing the need for improved reliability in LLM outputs (PhantomBench).
Practical Impact
The ongoing developments in LLMs and their applications across various domains, including legal, biomedical, and safety-critical systems, highlight the importance of continuous improvement and adaptation in AI technologies. As practitioners, staying informed about these advancements is crucial for leveraging the latest tools and methodologies in AI applications.