Coding
MatchLM2Lite: A Scalable MLLM-to-Lite Framework for Reproduced Content Identification
MatchLM2Lite is a newly introduced real-time framework for reproduced content identification (RCI) that utilizes a multimodal large language model (MLLM) to enhance content authenticity on video platforms. The system consists of two modules: MatchLM, a high-capacity teacher model achieving an F1-score improvement of +8.57, and MatchLite, a distilled student model that maintains a +6.55 F1-score gain while reducing computational costs by 35x. This architecture enables efficient pairwise multimodal RCI with low-latency inference, making it suitable for integration into real-time recommendation systems and demonstrating a 2.5% reduction in reproduced video views without negatively impacting user engagement.
content moderationreproduced contentmllm