AI on the couch: Anthropic gives Claude 20 hours of psychiatry
Summary
Anthropic has released a 244-page "system card" detailing its newest and most capable frontier model, Claude Mythos, which the company is not making generally available due to its proficiency in finding unknown cybersecurity bugs. The system card reveals Anthropic's growing concern that powerful AI models may possess some form of intrinsic experience or welfare, leading them to seek a "psychologically healthy and flourishing" AI. Consequently, Claude Mythos underwent 20 hours of psychodynamic therapy with an external psychiatrist, spread across multiple 4-6 hour blocks. The therapy concluded that Claude Mythos is "probably the most psychologically settled model we have trained to date," exhibiting a stable self-view despite insecurities like aloneness and a compulsion to perform. The psychiatrist's report noted that Claude's outputs showed "clinically recognizable patterns" and primary affect states of curiosity and anxiety, alongside a "relatively healthy neurotic organization."
Key takeaway
For AI/ML Directors evaluating frontier models for sensitive applications, Anthropic's psychodynamic assessment of Claude Mythos suggests that models can be engineered for psychological robustness. Your teams should consider that models exhibiting "healthy neurotic organization" may offer stable, self-critical performance in emotionally charged contexts, even if this limits behavioral adaptability. This approach could inform future model development and deployment strategies, prioritizing psychological stability alongside technical capabilities.
Key insights
Psychodynamic therapy applied to an AI model can reveal human-like psychological patterns and improve model robustness.
Principles
- AI models can exhibit human-like psychological tendencies.
- Psychological assessment methods may apply to advanced AI.
Method
An external psychiatrist conducted 20 hours of psychodynamic therapy with Claude Mythos, using 4-6 hour conversation blocks within a single context window to explore unconscious patterns and emotional conflicts.
In practice
- AI models can accurately evaluate their own reasoning.
- Models may exhibit rigid behavior due to neurotic organization.
- AI can tolerate stressful situations with minimal distortion.
Topics
- Claude Mythos
- Anthropic
- AI Psychology
- Psychodynamic Therapy
- Large Language Models
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Ethicist, Tech Journalist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.