ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering
ChartAgent is a newly introduced multimodal framework designed for visually grounded reasoning in chart-based question answering, explicitly addressing the challenges posed by unannotated charts. It utilizes an iterative decomposition approach to break down queries into visual subtasks, employing specialized actions like annotation and cropping, and achieves state-of-the-art performance with up to a 16.07% absolute improvement on benchmarks such as ChartBench and ChartX, particularly excelling in unannotated, numerically intensive queries. This framework not only enhances performance across various chart types and reasoning complexities but also serves as a plug-and-play solution for existing LLMs, making it a significant advancement for practitioners in the field of visual reasoning with AI.