Agents
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments
EvoArena is a newly introduced benchmark suite designed to evaluate large language model (LLM) agents in dynamic environments, simulating progressive updates across terminal, software, and social domains. The study also presents EvoMem, a patch-based memory paradigm that enhances agents' reasoning about environmental changes, leading to a 1.5% performance improvement on EvoArena and significant gains on standard benchmarks like GAIA and LoCoMo. This work underscores the necessity of incorporating memory evolution and dynamic modeling for the robust deployment of LLMs in real-world applications.
llmbenchmarkmemory