PRISMR: Overcoming Parse Collapse in Multimodal Listwise Ranking via Parameterized Representation Internalization
The article introduces PRISMR (Parameterized Representation Internalization for Semantic Multimodal Ranking), a framework designed to address the parse collapse issue in generative listwise ranking with Large Multimodal Models (LMMs). By utilizing a lightweight hypernetwork to encode multimodal candidates and generate item-specific LoRA weights, PRISMR enhances the internalization of list structures while maintaining the integrity of the base model. The proposed framework significantly improves ranking performance on a newly established large-scale multimodal review-ranking benchmark, demonstrating effective transfer across various domains and instruction-tuned architectures, which is critical for practitioners working with LMMs in complex multimodal contexts.