ai-digest.dev
last updated 13 h ago
AgentsarXiv cs.AI 4 d ago

Regimes: An Auditable, Held-Out-Gated Improvement Loop Demonstrated on LongMemEval with ActiveGraph

The article introduces "Regimes," an event-sourced agent runtime designed to enhance autonomous improvement loops by integrating a controlled workflow with an append-only event log. This system demonstrates a held-out-gated improvement loop on the ActiveGraph runtime, which diagnoses evaluation failures and proposes repairs, achieving held-out accuracy improvements of +0.05 to +0.10 on LongMemEval-S across multiple splits. This approach provides practitioners with a framework for auditing and refining agent performance, making it easier to trust and validate improvements in AI systems.

autonomous improvementevent sourcingagent runtimerelevance 0.00 · engagement 0.00
Read at source ↗← all news
Regimes: An Auditable, Held-Out-Gated Improvement Loop Demonstrated on LongMemEval with ActiveGraph — AI News Digest