SafetyarXiv cs.AI — 8 d ago

ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models

The article introduces ASRU, a novel framework for machine unlearning in multimodal large language models (MLLMs) that emphasizes both the effectiveness of unlearning and the quality of generated outputs. ASRU utilizes activation redirection to induce refusal behavior and refines refusal boundaries with a specialized reward function, achieving a 24.6% improvement in unlearning effectiveness and a 5.8-fold enhancement in generation quality on the Qwen3-VL model. This advancement is significant for practitioners as it addresses the challenges of maintaining model utility while removing sensitive information, thus enhancing the safety and usability of MLLMs.

unlearningmultimodalmodel-safetyrelevance 0.00 · engagement 0.00

Read at source ↗← all news