Safety
Let Them Steal: Trapping Large Language Model Extraction Attacks with Knowledge Honeypot
The paper introduces "Knowledge Trap," a defense mechanism against model extraction attacks on large language models, utilizing a Honeypot Knowledge Graph (HKG) to misdirect attackers towards low-value knowledge. It demonstrates a 6.2% reduction in surrogate agreement in experimental settings within medical and financial domains while maintaining performance for legitimate users. This approach presents a viable strategy for practitioners to enhance the security of LLMs without compromising user experience.
llmextraction-attacksdefense