MultimodalarXiv cs.AI — 12 d ago

Plug-and-Adapt: Multimodal Coreference Resolution at First Sight with a Pretrained Alignment Model

The paper introduces a plug-and-adapt method for Multimodal Coreference Resolution (MCR) that utilizes a pretrained alignment model to enhance performance without the need for extensive training on specific datasets. By pre-training a fine-grained alignment model on vision-language datasets and adapting it to MCR tasks through similarity aggregation, the approach achieves a 5.31% and 2.12% improvement in CoNLL F1 scores over state-of-the-art methods and popular Vision-Language Large Models (VLLMs) on the Coreference Image Narratives (CIN) benchmark. This method is significant for practitioners as it reduces the dependency on resource-intensive models and annotated data, facilitating easier deployment in real-world applications.

coreference-resolutionalignmentrelevance 0.00 · engagement 0.00

Read at source ↗← all news