Agents
ToolMenuBench: Benchmarking Tool-Menu Filtering Strategies for Reliable and Efficient LLM Agents
ToolMenuBench has been introduced as a benchmark for evaluating tool-menu construction in multi-step large language model (LLM) agents, focusing on aspects such as reliability, efficiency, and risk exposure. The benchmark assesses various configurations, including tool-menu size and filtering methods, revealing that Causal Minimal Tool Filtering (CMTF) significantly enhances task success from 32.1% to 85.7% while reducing token usage by approximately 98%. This framework is crucial for practitioners as it informs the design of agent interfaces, optimizing the visibility of tools based on performance and safety metrics.
benchmarkingtool-menuLLM