Research
MemTrace: Probing What Final Accuracy Misses in Long-Term Memory
MemTrace is a new benchmark introduced to evaluate long-term memory in LLM agents by measuring knowledge points—specific facts about users—rather than relying on aggregated accuracy from question episodes. It assesses memory performance across three dimensions: memory age, question type, and evidence condition, revealing that many systems struggle with evidence utilization rather than retrieval. This highlights the need for advancements in how evidence is processed and utilized to enhance long-term memory capabilities in AI systems.
memoryllmevaluation