Sustaining AI safety: Control-theoretic external impossibility, intrinsic necessity, and structural requirements
Summary
James M. Mazzu of Digie Inc. presents a control-theoretic analysis of AI safety, establishing that strategies relying on continued external enforcement cannot sustain safety once AI system effects exceed bounded external control. The paper introduces a formal model of a coupled human-AI-world system, defining safety as forward invariance of a "safe set" S. Under explicit premises, including bounded external control authority (A1), a supercritical boundary control-authority gap (A2), and reachability of this boundary (A3), Theorem 1 proves a class-wide external impossibility result. This means no externally enforced strategy can sustain AI safety for all initial conditions. Subsequently, Corollary 1 and Proposition 1 argue that any remaining viable safety-sustaining strategies must be intrinsic, satisfying four structural requirements: no dependence on continued external enforcement, safety-compatible terminal objective genesis, terminal objective invariance under self-modification, and consistency under capability scaling. The work formalizes a long-standing concern about the limits of external control, narrowing the design space for future AI safety strategies.
Key takeaway
For AI Scientists and Research Scientists developing advanced AI systems, this analysis indicates that relying on external control mechanisms for long-term safety is fundamentally unsustainable. You must shift focus towards designing systems with intrinsic safety properties, ensuring that the AI's terminal objective is safety-compatible from inception and remains stable even as the system self-modifies and its capabilities grow. This necessitates a deeper understanding and operationalization of internal safety mechanisms rather than continuous external oversight.
Key insights
External control cannot sustain AI safety; intrinsic safety, based on internal objectives, is necessary.
Principles
- External control is inherently bounded.
- AI capability growth can exceed external control.
- Intrinsic safety requires stable, safety-compatible internal objectives.
Method
The paper uses control theory to model AI systems and their interaction with external control, proving impossibility and necessity results for safety-sustaining strategies based on system dynamics and state invariance.
In practice
- Evaluate AI safety strategies for intrinsic properties.
- Focus on terminal objective stability under self-modification.
- Assess capability scaling impact on safety preservation.
Topics
- AI Safety Strategies
- Control Theory
- External Control Limits
- Intrinsic Safety
- Terminal Objectives
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.