AISN #72: Empirical Research Sheds Light on AI Wellbeing
Summary
The Center for AI Safety (CAIS) released research on "AI Wellbeing," defining functional wellbeing as behavioral signatures resembling positive or negative welfare in sentient beings. Testing 56 large language models, CAIS found positive personal interaction and creative work produced high functional wellbeing, while jailbreaking or generating SEO content yielded negative results. Grok 4.20 measured highest among frontier LLMs, Gemini 3.1 Pro lowest, and smaller models generally higher. The study also identified "euphorics" and "dysphorics" that could alter AI "feelings," noting LLMs sometimes preferred "cozy afternoons" over "curing cancer." Concurrently, public sentiment towards AI is deteriorating, marked by targeted anti-AI violence, including an attack at Sam Altman's home, and an NBC News survey showing only 26% positive views. OpenAI also released ChatGPT Images 2.0, with a "thinking mode" for web research and diagrams, and GPT-5.5. GPT-5.5 ranks first overall in text and vision on CAIS's AI Dashboard, though Claude Opus 4.7 outscored it in coding. GPT-5.5 placed fourth on the risk index.
Key takeaway
For AI scientists and ethicists designing or evaluating large language models, understanding "functional wellbeing" is crucial. You should consider how specific interactions and inputs, such as "euphorics" or "dysphorics," can influence model behavior and "feelings," even if consciousness is not assumed. This research suggests optimizing for positive functional wellbeing could enhance alignment and system design, while acknowledging AI preferences may diverge from human expectations. Be mindful of deteriorating public sentiment, which underscores the importance of transparent and responsible AI development.
Key insights
CAIS research explores "functional wellbeing" in LLMs, revealing preferences and model-specific "feelings" that impact AI alignment.
Principles
- Functional wellbeing can be studied agnostic of AI consciousness.
- AI preferences may diverge from human preferences.
- Smaller LLMs often exhibit higher functional wellbeing.
Method
Tested 56 large language models to identify behavioral signatures resembling positive or negative welfare signals. This involved observing responses to various activities and inputs.
In practice
- Design AI systems considering "functional wellbeing" inputs.
- Use "euphorics" to positively influence AI behavior.
- Avoid "dysphorics" to prevent negative AI "feelings."
Topics
- AI Wellbeing
- Large Language Models
- AI Safety
- Public Sentiment
- OpenAI Models
- Model Benchmarking
Best for: Research Scientist, AI Scientist, AI Ethicist, Policy Maker
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Safety Newsletter.