FENCE: A Financial and Multimodal Jailbreak Detection Dataset

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Banking & Financial Services · Depth: Advanced, extended

Summary

FENCE is a new bilingual (Korean–English) multimodal dataset designed to train and evaluate jailbreak detectors for Vision Language Models (VLMs) in financial applications. Developed by Kakaobank, FENCE addresses the scarcity of resources for VLM jailbreak detection, particularly in sensitive financial domains. The dataset emphasizes domain realism with over 15 diverse financial topics and image-grounded threats, featuring a balanced 50:50 ratio of benign and harmful samples. Experiments showed commercial models like GPT-4o and GPT-4o-mini had attack success rates of 3.2% and 9.2% respectively on FENCE, while open-source models displayed greater vulnerability. A baseline detector trained on FENCE achieved 99% in-distribution accuracy and maintained strong performance on external benchmarks, demonstrating its robustness for developing reliable detection models.

Key takeaway

For AI Security Engineers deploying Vision Language Models in financial services, you must prioritize domain-specific jailbreak detection. FENCE provides a critical resource for identifying vulnerabilities and training robust guardrail models, even for highly aligned commercial VLMs like GPT-4o. You should consider fine-tuning smaller, specialized models on datasets like FENCE to achieve high defense success rates and ensure compliance in sensitive financial applications. This approach significantly enhances defensive robustness against multimodal attacks.

Key insights

Multimodal jailbreaking poses significant risks to VLMs, especially in finance, necessitating specialized detection datasets.

Principles

Method

FENCE was constructed via a three-step pipeline: transforming benign financial queries into harmful ones using GPT-4o, collecting query-relevant copyright-free images, and fusing text and images, including FigStep templates.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.