Research
TVIR: Building Deep Research Agents Towards Text-Visual Interleaved Report Generation
The article introduces TVIR (Text-Visual Interleaved Report Generation) and its associated benchmark, TVIR-Bench, which comprises 100 expert-curated multimodal tasks that integrate visual elements with text for analytical purposes. The TVIR-Agent framework is presented as a hierarchical multi-agent system capable of generating reports by constructing outlines, retrieving images, and creating charts, with a dual-path evaluation framework assessing both textual and visual components. This development highlights the need for multimodal approaches in evidence-driven report generation, offering a robust baseline for future research in deep research agents.
deep-research-agentsreport-generationmultimodal