Research
MBABench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance
The article introduces MBABench, a new benchmark for evaluating LLM agents on end-to-end spreadsheet tasks specifically in finance, addressing a gap in existing benchmarks that focus on simpler tasks. The evaluation framework includes three dimensions: Accuracy, Formula, and Format, assessing the quality of outputs based on professional standards. Results indicate that while the Claude family of models performs best in producing professional-looking spreadsheets, they still struggle with complex workflows, highlighting the need for further advancements in LLM capabilities for practical financial applications.
llmagentsspreadsheet