A taxonomy of barriers to trading with early misaligned AIs

2024-06-17 · Source: Redwood Research blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, AI Safety & Alignment · Depth: Expert, extended

Summary

This analysis explores the feasibility of making "deals" with early misaligned AIs to reduce takeover risk and improve future outcomes. It categorizes potential barriers into three main types: insufficient gains from trade, counterparty risks from the AI's perspective, and counterparty risks from the human perspective. Insufficient gains from trade can arise if humans lack the authority or willingness to offer what the AI wants, or if quantitative factors reduce the value of the trade. AI counterparty risks include difficulties in verifying reality, fears of expropriation due to lack of legal personhood, and generic human commitment problems. Human counterparty risks stem from AI incoherence or temporal inconsistency, and challenges in verifying AI compliance. The author argues that while all three barrier types can undermine some deals, none fundamentally blocks all deals, and most are tractable to mitigate. The piece also outlines various types of goods humans can "buy" from AIs, such as evidence of misalignment or useful work, and different payment structures, including short-term consumption or long-term influence.

Key takeaway

Research Scientists focused on AI safety should prioritize understanding and mitigating the identified barriers to AI dealmaking. Focus on developing concrete deal plans that bypass intractable issues, such as those requiring unlikely government buy-in, and explore upfront payment protocols. You should also invest in gauging AI developer buy-in and studying AI motivations conducive to large gains from trade, as these interventions are crucial for unblocking preferred deal configurations.

Key insights

Deals with misaligned AIs are feasible, but require addressing specific barriers related to gains from trade and mutual credibility.

Principles

No single barrier fundamentally blocks all AI deals.
Credibility and gains from trade are often interdependent.
AI coherence exists on a spectrum, impacting deal value.

Method

A taxonomy of barriers to AI dealmaking is presented, focusing on insufficient gains from trade and mutual counterparty risks, with proposed mitigations for each category.

In practice

Prioritize deals with near-term, verifiable completion.
Develop upfront payment protocols for AIs.
Monitor frontier models for deal suitability.

Topics

Deals with Misaligned AIs
Gains from Trade
AI Counterparty Risk
Human Credibility
AI Epistemic Security

Best for: Research Scientist, AI Scientist, AI Architect, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Redwood Research blog.