Training
MemPO: Self-Memory Policy Optimization for Long-Horizon Agents
The article introduces MemPO, a self-memory policy optimization algorithm designed for long-horizon agents, which autonomously manages and summarizes memory to enhance performance and stability during environmental interactions. MemPO improves the credit assignment mechanism based on memory effectiveness, achieving an F1 score increase of 25.98 over the base model and 7.1 over the previous state-of-the-art, while also reducing token usage by 67.58% to 73.12%. This advancement is significant for practitioners as it enables more efficient memory management in AI models, potentially leading to better performance in complex tasks with limited computational resources.
memoryoptimizationagents