Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents
The paper presents a study on optimizing context management for large language models (LLMs) in enterprise workflows, specifically focusing on automated expense itemization in Microsoft Dynamics 365. Evaluating four configurations of GPT-5 on a 50-task hotel expense benchmark, the study finds that context pruning combined with automated summarization yields the best performance, achieving 91.6% complete itemization with a reduced token count of 553,374 and a runtime of 5.79 hours. This research highlights the importance of efficient context engineering, demonstrating that selective retention and summarization can significantly enhance the reliability and efficiency of LLMs in long-horizon tool-using scenarios, which is critical for practitioners aiming to optimize resource usage in AI applications.