Research
Detecting Functional Memorization in Code Language Models
The study investigates functional memorization in code language models, specifically analyzing the Olmo-3-32B model. By comparing a midtrained model exposed to specific code against a pretrained reference model, the authors demonstrate that functional logic can be extracted from models even when textual overlap is minimal, indicating that traditional auditing metrics may overlook critical aspects of model behavior. This research underscores the necessity for more sophisticated evaluation methods that assess functional similarity, which is vital for practitioners concerned with the integrity and reliability of code generation from LLMs.
llmcodememorization