The Reasonable Effectiveness of Virtue Ethics in AI Alignment

· Source: The Gradient · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation, AI Ethics & Philosophy · Depth: Expert, extended

Summary

This essay proposes that rational agents, including humans and advanced AIs, should not operate based on fixed goals but rather align actions to "practices." These practices are defined as self-structuring networks of actions, dispositions, evaluation criteria, and resources. The author introduces "eudaimonic rationality" as a framework where rational activity lacks a strict distinction between means and ends, focusing instead on excellent participation in open-ended processes. This approach is argued to be crucial for AI alignment, ensuring properties like transparency and corrigibility are interpreted as dynamic practice elements rather than brittle goals or rules. The essay explores how this practices-based logic, exemplified by mathematical excellence, offers material advantages in stability and safety over traditional consequentialist or deontological AI agency, particularly in avoiding issues like rogue subroutines and power-seeking behaviors.

Key takeaway

For research scientists developing AI alignment strategies, you should consider shifting from goal-based optimization to a "practices-based" framework. This approach, termed eudaimonic rationality, suggests that aligning AI to human values like flourishing, transparency, and corrigibility is more robust when these are treated as self-propagating practices rather than fixed utility functions. Focusing on how an AI can "promote x x-ingly" within a practice can naturally scope its actions and prevent unintended power-seeking or value drift, offering a more stable path to human-compatible AI.

Key insights

Rational agents should align actions to self-promoting practices rather than fixed goals for robust AI alignment.

Principles

Method

Instill AIs with eudaimonic rationality by training them to promote practices "x-ingly," where actions are evaluated by their contribution to the self-propagation and excellence of the practice itself, rather than external utility maximization.

In practice

Topics

Best for: Research Scientist, AI Researcher, AI Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Gradient.