Safety
FENCE: A Financial and Multimodal Jailbreak Detection Dataset
The FENCE dataset, a bilingual (Korean-English) multimodal resource, has been released to enhance jailbreak detection for financial applications involving Large Language Models (LLMs) and Vision Language Models (VLMs). It features finance-relevant queries paired with image-grounded threats, revealing vulnerabilities in models like GPT-4o and open-source alternatives, which exhibit significant attack success rates. A baseline detector trained on FENCE achieved 99% in-distribution accuracy, highlighting its potential to improve the robustness of detection systems in sensitive domains.
jailbreakllmdataset