SAMA: Semantic Anchor-aligned Augmentation for Unified Low-Resource Multimodal Information Extraction
The article introduces Semantic Anchor-aligned Multimodal Augmentation (SAMA), a novel framework designed to enhance Multimodal Information Extraction (MIE) tasks like Multimodal Named Entity Recognition, Relation Extraction, and Event Extraction, particularly in low-resource settings. SAMA employs a Collaborative Multi-Experts Multimodal Large Language Model (CME-MLLM) with a Universal Adapter and Task-Specific Adapters for generating high-fidelity synthetic data, alongside an Anchor-Preserving Diffusion mechanism for image synthesis and a Dual-Constraint Filtering module for sample selection. The framework demonstrates superior performance over existing augmentation methods across multiple benchmark datasets, highlighting its potential for improving data scarcity challenges in multimodal AI applications.