MultimodalarXiv cs.AI — 8 d ago

CoVEBench: Can Video Editing Models Handle Complex Instructions?

CoVEBench is a newly introduced benchmark for evaluating text-guided video editing models on complex, compositional editing tasks. It includes 416 source videos, 626 multi-point editing instructions, and 9,990 fine-grained checklist items to assess models on instruction compliance and video fidelity using both MLLM judgment and automated metrics. This benchmark highlights the limitations of current models in managing multiple edits simultaneously while preserving unrelated content, underscoring the need for improved methodologies in video editing AI to better align with real-world user requirements.

video editingmodelscomplex instructionsrelevance 0.00 · engagement 0.00

Read at source ↗← all news