SafetyarXiv cs.AI — 7 d ago

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

The FENCE dataset, a bilingual (Korean-English) multimodal resource, has been released to enhance jailbreak detection for financial applications involving Large Language Models (LLMs) and Vision Language Models (VLMs). It features finance-relevant queries paired with image-grounded threats, revealing vulnerabilities in models like GPT-4o and open-source alternatives, which exhibit significant attack success rates. A baseline detector trained on FENCE achieved 99% in-distribution accuracy, highlighting its potential to improve the robustness of detection systems in sensitive domains.

jailbreakllmdatasetrelevance 0.00 · engagement 0.00

Read at source ↗← all news