RAG
MMLongEmbed: Benchmarking Multimodal Embedding Models in Long-Context Scenarios
MMLongEmbed is introduced as the first comprehensive benchmark for evaluating Multimodal Embedding Models (MEMs) in long-context scenarios, comprising four retrieval tasks across text, document, and video modalities. The evaluation reveals that state-of-the-art models often rely on superficial feature matching, failing to effectively capture deep semantic and structural dependencies, with performance degradation linked to context length and information placement. This benchmark provides valuable insights for practitioners, highlighting the limitations of current architectures and the need for improved strategies in handling long-context multimodal inputs.
benchmarkmultimodalembedding