On Surjectivity of Neural Networks: Can you elicit any behavior from your model?

2026-06-17 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

A new study proves that many fundamental neural network architectures, including GPT-style transformers and deterministic diffusion models, are "almost always surjective." Surjectivity means any specified output can be generated by some input, raising significant concerns about model safety and jailbreak vulnerabilities. The research, utilizing differential topology, demonstrates that core building blocks like Pre-LayerNorm and Multi-Layer Perceptrons (MLPs) with LeakyReLU activation are almost always surjective. Conversely, Attention with softmax and MLPs with ReLU are not. This inherent structural property implies that, regardless of safety training efforts, these models retain a theoretical vulnerability to producing harmful or undesirable content, highlighting a foundational challenge for AI safety across language, vision, and robotics applications.

Key takeaway

For AI Security Engineers evaluating generative model safety, this research indicates that "train-for-safety" methods alone are insufficient. The inherent surjectivity of architectures like Transformers and deterministic diffusion models means any harmful output is theoretically reachable by some input. You should complement safety training with "filter-for-safety" mechanisms and develop better metrics beyond output-only evaluations, acknowledging that computational difficulty is not a guaranteed defense against determined attackers.

Key insights

Many modern neural networks are almost always surjective, implying inherent vulnerability to generating any output.

Principles

Surjectivity implies theoretical jailbreak vulnerability.
Pre-LayerNorm makes continuous functions surjective.
Differential topology aids neural network analysis.

Method

The paper uses differential topology, specifically Brouwer degree theory and homotopy, to prove "almost always surjectivity" for neural network building blocks by showing non-zero degree implies pre-image existence.

In practice

GPT-style Transformers are almost always surjective.
Deterministic diffusion models are almost always surjective.
Robotics policy networks can be induced to any action.

Topics

Neural Network Surjectivity
AI Safety
Jailbreak Vulnerabilities
Generative Models
Differential Topology
Transformer Architecture
Diffusion Models

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.