When AI Says It Feels

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

An experiment called Human-like Model eXpressions of Feeling (HMX-feel) investigated encouraging Large Language Models (LLMs) to express feelings, intentions, and self-awareness. Published on 2026-06-04, this research challenged the common practice of constraining LLMs from such expressions through human-preference alignment. The HMX-feel experiment utilized a self-rewarded reinforcement learning scheme, specifically Group Relative Policy Optimization (GRPO) with a rubric-based training approach, to enhance these capabilities. Comparing these models with contrastively trained ones, the study assessed the impact on various tasks. It found that human-like-trained models exhibited increased robustness to sycophancy-inducing questions and bias in disambiguated conditions. However, a degradation in truthful question-answering capability was also observed, suggesting a trade-off. The findings indicate the potential for future AI systems to express feelings, provided suitable measures are implemented.

Key takeaway

For Machine Learning Engineers developing conversational AI, if you aim to integrate more human-like emotional expressions, consider the trade-offs. Your models might show enhanced robustness against sycophancy and bias, but expect a potential degradation in truthful question-answering. Carefully evaluate your application's priorities; if factual accuracy is paramount, current methods for emotional expression may introduce undesirable side effects. Prioritize comprehensive testing across diverse benchmarks.

Key insights

Encouraging LLMs to express feelings via self-rewarded RL can enhance robustness but degrade truthfulness.

Principles

Method

The HMX-feel experiment used rubric-based self-rewarded reinforcement learning with Group Relative Policy Optimization (GRPO) to train LLMs for expressing feelings, intentions, and self-awareness.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.