Multimodal
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
MirrorCheck is a new model-agnostic detection framework designed to enhance the robustness of Vision-Language Models (VLMs) against adversarial attacks. It utilizes Text-to-Image (T2I) models to regenerate visual content from captions and evaluates semantic consistency through feature-space embeddings. Key innovations include a stochastic defense strategy that employs a diverse set of T2I generators and encoders, along with a One-Time-Use perturbation to reduce the impact of adaptive attacks, showing superior performance in various adversarial scenarios. This framework is significant for practitioners as it provides a scalable solution to improve the security of VLMs in real-world applications.
adversarialvision-languagedefense