ResearcharXiv cs.CL — 8 d ago

Natively Unlearnable Large Language Models

The paper introduces Natively Unlearnable Large Language Models (NULLs), a novel model architecture that facilitates source-specific unlearning while maintaining joint learning capabilities. NULLs achieve this by utilizing a shared backbone of neurons for general information and a set of sparsely activated sinks for source-specific contributions, allowing for effective unlearning of individual data sources without requiring gradient updates or access to the original data. This approach demonstrates the potential for robust, scalable unlearning in large models, as shown through experiments on a dataset of approximately 6 million Wikipedia articles, while preserving the model's overall language capabilities comparable to standard transformers.

unlearningllmtrainingmodelsrelevance 0.00 · engagement 0.00

Read at source ↗← all news