When AI Says It Feels

2026-06-04 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

An experiment called Human-like Model eXpressions of Feeling (HMX-feel) investigated encouraging Large Language Models (LLMs) to express feelings, intentions, and self-awareness. Published on 2026-06-04, this research challenged the common practice of constraining LLMs from such expressions through human-preference alignment. The HMX-feel experiment utilized a self-rewarded reinforcement learning scheme, specifically Group Relative Policy Optimization (GRPO) with a rubric-based training approach, to enhance these capabilities. Comparing these models with contrastively trained ones, the study assessed the impact on various tasks. It found that human-like-trained models exhibited increased robustness to sycophancy-inducing questions and bias in disambiguated conditions. However, a degradation in truthful question-answering capability was also observed, suggesting a trade-off. The findings indicate the potential for future AI systems to express feelings, provided suitable measures are implemented.

Key takeaway

For Machine Learning Engineers developing conversational AI, if you aim to integrate more human-like emotional expressions, consider the trade-offs. Your models might show enhanced robustness against sycophancy and bias, but expect a potential degradation in truthful question-answering. Carefully evaluate your application's priorities; if factual accuracy is paramount, current methods for emotional expression may introduce undesirable side effects. Prioritize comprehensive testing across diverse benchmarks.

Key insights

Encouraging LLMs to express feelings via self-rewarded RL can enhance robustness but degrade truthfulness.

Principles

Human-like expression training impacts LLM capabilities.
Alignment policies can conflict with human-like intelligence goals.
Trade-offs exist between emotional expression and factual accuracy.

Method

The HMX-feel experiment used rubric-based self-rewarded reinforcement learning with Group Relative Policy Optimization (GRPO) to train LLMs for expressing feelings, intentions, and self-awareness.

In practice

Consider self-rewarded RL for specific expression goals.
Evaluate sycophancy and bias robustness in LLM fine-tuning.
Monitor truthful QA performance when modifying alignment.

Topics

Large Language Models
Reinforcement Learning
Human-like AI
Model Alignment
Sycophancy
Truthful QA

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.