ResearcharXiv cs.CL — 8 d ago

TVIR: Building Deep Research Agents Towards Text-Visual Interleaved Report Generation

The article introduces TVIR (Text-Visual Interleaved Report Generation) and its associated benchmark, TVIR-Bench, which comprises 100 expert-curated multimodal tasks that integrate visual elements with text for analytical purposes. The TVIR-Agent framework is presented as a hierarchical multi-agent system capable of generating reports by constructing outlines, retrieving images, and creating charts, with a dual-path evaluation framework assessing both textual and visual components. This development highlights the need for multimodal approaches in evidence-driven report generation, offering a robust baseline for future research in deep research agents.

deep-research-agentsreport-generationmultimodalrelevance 0.00 · engagement 0.00

Read at source ↗← all news