Training
Null-Space Constrained Low-Rank Adaptation for Response-Specified Large Language Model Unlearning
The paper presents Null-Space Constrained Response-Specified Unlearning (NSRU), a novel framework for unlearning in large language models that employs a projection-constrained low-rank approach to suppress undesirable knowledge while maintaining benign capabilities. NSRU utilizes structured target responses to guide unlearning and confines updates to a null space derived from benign representations, demonstrating improved performance on benchmarks such as TOFU and WMDP in suppressing unwanted knowledge and enhancing model utility. This method is significant for practitioners as it offers a controlled mechanism for unlearning, balancing the retention of useful knowledge with the suppression of harmful content in LLMs.
llmunlearninglow-rank adaptation