ResearcharXiv cs.CL — 15 d ago

FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

FutureOmni is introduced as the first benchmark for evaluating omni-modal future forecasting in Multimodal Large Language Models (MLLMs), focusing on the prediction of future events from audio-visual cues. It includes 919 videos and 1,034 multiple-choice QA pairs across 8 domains, revealing that existing models, including Gemini 3 Flash with a maximum accuracy of 64.8%, struggle particularly in speech-heavy contexts. The benchmark is complemented by a 7K-sample instruction-tuning dataset and an Omni-Modal Future Forecasting (OFF) training strategy, which together enhance future forecasting capabilities and generalization in MLLMs, with all resources publicly available for further research.

multimodalforecastingllmrelevance 0.00 · engagement 0.00

Read at source ↗← all news