Safety
Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review
The article announces the release of PaperGuard, a comprehensive benchmark for evaluating and defending against adversarial attacks on AI-generated peer reviews, particularly in multimodal contexts where both text and figures are critical. It introduces a new multimodal dataset and a suite of targeted attack strategies, including black-box prompt injections and white-box perturbations, alongside a defense mechanism utilizing chunk-based embedding search to address vulnerabilities. This work is significant for practitioners as it lays the groundwork for developing more robust and trustworthy AI systems in scholarly review processes, highlighting the need for defenses against domain-specific manipulation.
peer reviewmultimodalai risks