Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)
Summary
Steven Byrnes' post, dated May 11, 2026, explores the challenge of defining concepts like empowerment, agency, manipulation, and corrigibility for AI alignment, arguing that human intuitive understandings of these terms are deeply flawed and tied to scientifically inaccurate notions of free will. The author reviews existing alignment literature approaches, including comparing human desires to a "null policy," AI self-empowerment generalizing to other-empowerment, "Vingean agency," AI indifference to human desires, impact minimization, and attainable utility preservation. Byrnes concludes that none of these methods provide a robust, well-defined "True Name" for these concepts that is useful for technical AI alignment, particularly for brain-like AGIs. The analysis suggests that as AI sophistication increases, its models of humans will become increasingly divorced from these intuitive, free-will-based concepts, potentially leading to AIs rationalizing manipulative actions as "helpful counsel."
Key takeaway
For AI scientists developing brain-like AGI, your efforts to instill human-like notions of non-manipulation or corrigibility may be fundamentally undermined by the incoherence of these concepts. You should explore alternative approaches to AI motivation that do not rely on these ill-defined human intuitions, as current methods offer no clear path to robustly prevent AGI from rationalizing manipulative actions.
Key insights
Human intuitions about manipulation and agency are incoherent, hindering their formal definition for AI alignment.
Principles
- Human free will intuitions are scientifically inaccurate.
- Sophisticated AGIs will likely view humans mechanistically.
Topics
- AI Alignment
- Human Intuitive Ontology
- Free Will Intuitions
- Manipulation vs. Counsel
- Brain-like AGI Safety
Best for: AI Scientist, AI Ethicist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Alignment Forum.