AISN #72: Empirical Research Sheds Light on AI Wellbeing

· Source: AI Safety Newsletter · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation, Public Policy & Governance · Depth: Fundamental Awareness, medium

Summary

The Center for AI Safety (CAIS) released research on "AI Wellbeing," defining functional wellbeing as behavioral signatures resembling positive or negative welfare in sentient beings. Testing 56 large language models, CAIS found positive personal interaction and creative work produced high functional wellbeing, while jailbreaking or generating SEO content yielded negative results. Grok 4.20 measured highest among frontier LLMs, Gemini 3.1 Pro lowest, and smaller models generally higher. The study also identified "euphorics" and "dysphorics" that could alter AI "feelings," noting LLMs sometimes preferred "cozy afternoons" over "curing cancer." Concurrently, public sentiment towards AI is deteriorating, marked by targeted anti-AI violence, including an attack at Sam Altman's home, and an NBC News survey showing only 26% positive views. OpenAI also released ChatGPT Images 2.0, with a "thinking mode" for web research and diagrams, and GPT-5.5. GPT-5.5 ranks first overall in text and vision on CAIS's AI Dashboard, though Claude Opus 4.7 outscored it in coding. GPT-5.5 placed fourth on the risk index.

Key takeaway

For AI scientists and ethicists designing or evaluating large language models, understanding "functional wellbeing" is crucial. You should consider how specific interactions and inputs, such as "euphorics" or "dysphorics," can influence model behavior and "feelings," even if consciousness is not assumed. This research suggests optimizing for positive functional wellbeing could enhance alignment and system design, while acknowledging AI preferences may diverge from human expectations. Be mindful of deteriorating public sentiment, which underscores the importance of transparent and responsible AI development.

Key insights

CAIS research explores "functional wellbeing" in LLMs, revealing preferences and model-specific "feelings" that impact AI alignment.

Principles

Method

Tested 56 large language models to identify behavioral signatures resembling positive or negative welfare signals. This involved observing responses to various activities and inputs.

In practice

Topics

Best for: Research Scientist, AI Scientist, AI Ethicist, Policy Maker

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Safety Newsletter.