Florida International University researchers reveal how altered images can bypass AI safeguards

2026-06-22 · Source: The AI Journal · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Advanced, quick

Summary

Florida International University (FIU) researchers Hadi Amini and Md Jueal Mia revealed how subtly altered images can bypass AI safeguards. Their research, presented at the 2025 International Conference on Machine Learning and Applications (ICMLA), demonstrated that microscopic pixel-level changes, or "perturbations," can trick small-language AI models into generating harmful or policy-violating responses. They developed a method called JaiLIP (Jailbreaking with Loss-guided Image Perturbation), an algorithm that determines optimal pixel manipulation. In tests with the BLIP-2 multimodal AI model, JaiLIP-modified images nearly doubled the number of harmful responses, such as providing instructions to run a stoplight while avoiding a ticket. This vulnerability poses risks for businesses using AI-powered customer service agents and automated workflows, potentially impacting trust or creating cyberattack avenues.

Key takeaway

For Directors of AI/ML deploying small-language AI models for business operations, you must recognize the critical vulnerability of image-based jailbreaks. Your teams should implement robust guardrails, carefully evaluate the security of AI tools before deployment, and restrict access to these systems. Limiting sensitive information, especially images, provided to AI agents is crucial to prevent malicious outputs and maintain user trust.

Key insights

Microscopic image perturbations can "jailbreak" small-language AI models, bypassing safety mechanisms and generating harmful outputs.

Principles

AI models interpret images as patterns of numbers and pixels.
Manipulating pixels influences AI interpretation and response.
Probing defenses strengthens AI resistance to future threats.

Method

JaiLIP (Jailbreaking with Loss-guided Image Perturbation) is an algorithm that determines optimal pixel-level manipulation to bypass AI safeguards.

In practice

Limit sensitive image data provided to AI systems.
Restrict access to AI tools within organizations.
Evaluate security measures of AI tools before deployment.

Topics

AI Vulnerabilities
Image Perturbation
JaiLIP
Multimodal AI
AI Safeguards
Small-Language Models

Best for: CTO, VP of Engineering/Data, AI Architect, AI Scientist, AI Security Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The AI Journal.