Florida International University researchers reveal how altered images can bypass AI safeguards
Summary
Florida International University (FIU) researchers Hadi Amini and Md Jueal Mia revealed how subtly altered images can bypass AI safeguards. Their research, presented at the 2025 International Conference on Machine Learning and Applications (ICMLA), demonstrated that microscopic pixel-level changes, or "perturbations," can trick small-language AI models into generating harmful or policy-violating responses. They developed a method called JaiLIP (Jailbreaking with Loss-guided Image Perturbation), an algorithm that determines optimal pixel manipulation. In tests with the BLIP-2 multimodal AI model, JaiLIP-modified images nearly doubled the number of harmful responses, such as providing instructions to run a stoplight while avoiding a ticket. This vulnerability poses risks for businesses using AI-powered customer service agents and automated workflows, potentially impacting trust or creating cyberattack avenues.
Key takeaway
For Directors of AI/ML deploying small-language AI models for business operations, you must recognize the critical vulnerability of image-based jailbreaks. Your teams should implement robust guardrails, carefully evaluate the security of AI tools before deployment, and restrict access to these systems. Limiting sensitive information, especially images, provided to AI agents is crucial to prevent malicious outputs and maintain user trust.
Key insights
Microscopic image perturbations can "jailbreak" small-language AI models, bypassing safety mechanisms and generating harmful outputs.
Principles
- AI models interpret images as patterns of numbers and pixels.
- Manipulating pixels influences AI interpretation and response.
- Probing defenses strengthens AI resistance to future threats.
Method
JaiLIP (Jailbreaking with Loss-guided Image Perturbation) is an algorithm that determines optimal pixel-level manipulation to bypass AI safeguards.
In practice
- Limit sensitive image data provided to AI systems.
- Restrict access to AI tools within organizations.
- Evaluate security measures of AI tools before deployment.
Topics
- AI Vulnerabilities
- Image Perturbation
- JaiLIP
- Multimodal AI
- AI Safeguards
- Small-Language Models
Best for: CTO, VP of Engineering/Data, AI Architect, AI Scientist, AI Security Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The AI Journal.