Did You Forget What I Asked? Prospective Memory Failures in Large Language Models
The study examines prospective memory failures in large language models (LLMs) by analyzing their ability to comply with formatting instructions under concurrent task loads. It finds that compliance decreases by 2-21% across three model families with increased task complexity, particularly under terminal constraints, which can reduce compliance by up to 50%. Implementing a salience-enhanced format significantly improves compliance, restoring it to 90-100%, while highlighting that formatting constraints can adversely affect task accuracy, as demonstrated by a drop in GSM8K accuracy from 93% to 27%. This research is crucial for practitioners as it underscores the importance of task design and instruction clarity in optimizing LLM performance.