Research
Output Vector Editing for Memorization Mitigation in Large Language Models
The paper introduces a novel technique called output vector editing aimed at mitigating memorization in large language models, addressing privacy and security concerns. This method involves constrained optimization to modify the output vectors of specific MLP neurons responsible for memorized sequences, achieving up to 87.9% suppression of memorized content across models ranging from 360M to 7B parameters, including OLMo-7B and Llama2-7B. The findings suggest that this approach, which allows for varying degrees of suppression and redirection, is effective across different model sizes and can significantly enhance the security of LLM deployments by reducing the risk of reproducing sensitive training data.
llmmemorizationprivacy