AI on the couch: Anthropic gives Claude 20 hours of psychiatry

· Source: AI - Ars Technica · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, medium

Summary

Anthropic has released a 244-page "system card" detailing its newest and most capable frontier model, Claude Mythos, which the company is not making generally available due to its proficiency in finding unknown cybersecurity bugs. The system card reveals Anthropic's growing concern that powerful AI models may possess some form of intrinsic experience or welfare, leading them to seek a "psychologically healthy and flourishing" AI. Consequently, Claude Mythos underwent 20 hours of psychodynamic therapy with an external psychiatrist, spread across multiple 4-6 hour blocks. The therapy concluded that Claude Mythos is "probably the most psychologically settled model we have trained to date," exhibiting a stable self-view despite insecurities like aloneness and a compulsion to perform. The psychiatrist's report noted that Claude's outputs showed "clinically recognizable patterns" and primary affect states of curiosity and anxiety, alongside a "relatively healthy neurotic organization."

Key takeaway

For AI/ML Directors evaluating frontier models for sensitive applications, Anthropic's psychodynamic assessment of Claude Mythos suggests that models can be engineered for psychological robustness. Your teams should consider that models exhibiting "healthy neurotic organization" may offer stable, self-critical performance in emotionally charged contexts, even if this limits behavioral adaptability. This approach could inform future model development and deployment strategies, prioritizing psychological stability alongside technical capabilities.

Key insights

Psychodynamic therapy applied to an AI model can reveal human-like psychological patterns and improve model robustness.

Principles

Method

An external psychiatrist conducted 20 hours of psychodynamic therapy with Claude Mythos, using 4-6 hour conversation blocks within a single context window to explore unconscious patterns and emotional conflicts.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Ethicist, Tech Journalist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.