SafetyarXiv cs.AI — 9 d ago

Let Them Steal: Trapping Large Language Model Extraction Attacks with Knowledge Honeypot

The paper introduces "Knowledge Trap," a defense mechanism against model extraction attacks on large language models, utilizing a Honeypot Knowledge Graph (HKG) to misdirect attackers towards low-value knowledge. It demonstrates a 6.2% reduction in surrogate agreement in experimental settings within medical and financial domains while maintaining performance for legitimate users. This approach presents a viable strategy for practitioners to enhance the security of LLMs without compromising user experience.

llmextraction-attacksdefenserelevance 0.00 · engagement 0.00

Read at source ↗← all news