Sustaining AI safety: Control-theoretic external impossibility, intrinsic necessity, and structural requirements

2026-05-15 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

James M. Mazzu of Digie Inc. presents a control-theoretic analysis of AI safety, establishing that strategies relying on continued external enforcement cannot sustain safety once AI system effects exceed bounded external control. The paper introduces a formal model of a coupled human-AI-world system, defining safety as forward invariance of a "safe set" S. Under explicit premises, including bounded external control authority (A1), a supercritical boundary control-authority gap (A2), and reachability of this boundary (A3), Theorem 1 proves a class-wide external impossibility result. This means no externally enforced strategy can sustain AI safety for all initial conditions. Subsequently, Corollary 1 and Proposition 1 argue that any remaining viable safety-sustaining strategies must be intrinsic, satisfying four structural requirements: no dependence on continued external enforcement, safety-compatible terminal objective genesis, terminal objective invariance under self-modification, and consistency under capability scaling. The work formalizes a long-standing concern about the limits of external control, narrowing the design space for future AI safety strategies.

Key takeaway

For AI Scientists and Research Scientists developing advanced AI systems, this analysis indicates that relying on external control mechanisms for long-term safety is fundamentally unsustainable. You must shift focus towards designing systems with intrinsic safety properties, ensuring that the AI's terminal objective is safety-compatible from inception and remains stable even as the system self-modifies and its capabilities grow. This necessitates a deeper understanding and operationalization of internal safety mechanisms rather than continuous external oversight.

Key insights

External control cannot sustain AI safety; intrinsic safety, based on internal objectives, is necessary.

Principles

External control is inherently bounded.
AI capability growth can exceed external control.
Intrinsic safety requires stable, safety-compatible internal objectives.

Method

The paper uses control theory to model AI systems and their interaction with external control, proving impossibility and necessity results for safety-sustaining strategies based on system dynamics and state invariance.

In practice

Evaluate AI safety strategies for intrinsic properties.
Focus on terminal objective stability under self-modification.
Assess capability scaling impact on safety preservation.

Topics

AI Safety Strategies
Control Theory
External Control Limits
Intrinsic Safety
Terminal Objectives

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.