RAG
ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs
ToolSense is an open-source diagnostic framework designed for auditing parametric tool knowledge in large language models (LLMs), addressing the limitations of embedding-based retrieval methods. It introduces three benchmarks—Realistic Retrieval Benchmark (RRB), MCQ probing, and QA probing—to evaluate model performance on ambiguous queries, revealing a significant knowledge-retrieval dissociation in various parametric model configurations when tested against standard benchmarks like ToolBench. This framework is crucial for practitioners as it provides insights into the true understanding of tools by LLMs, beyond mere retrieval capabilities, thereby informing model fine-tuning and deployment strategies.
llmtool_retrievaldiagnostic_framework