Training
PreUnlearn: Auditing Collateral Knowledge Damage Before Large Language Model Unlearning
The paper introduces "PreUnlearn," a framework for auditing collateral knowledge damage prior to unlearning in large language models (LLMs). It quantitatively analyzes the propagation of unlearning effects, revealing a decay pattern of collateral damage that is strongest near the forget set and diminishes with semantic distance, yet persists across domain boundaries. The study emphasizes the importance of forget-set auditing as a predictive task, leveraging interaction features to identify potential risks in unlearning processes, which is critical for practitioners aiming to implement effective and reliable unlearning strategies in LLMs.
llmunlearningknowledge