What Do People Actually Want From AI? Mapping Preference Plurality

2023-10-31 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Social Sciences & Behavioral Studies · Depth: Expert, extended

Summary

A study analyzing 1,500 open-ended responses from the PRISM dataset across 75 countries reveals that current Reinforcement Learning from Human Feedback (RLHF) methods for Large Language Model (LLM) alignment fail to capture the complexity of human preferences. The research, published on January 13, 2026, found that most desired values are requested by fewer than a quarter of respondents, with "truthfulness" being the sole exception at 49%. However, "truthfulness" itself holds diverse, often incompatible, definitions, ranging from sourced claims to expert opinions or even unpopular views. Capabilities like human-like behavior and features such as AI guardrails are controversial. The findings highlight that binary preference models overlook contextual distinctions and lead to "epistemic violence" by flattening nuanced, contested signals into universal preference models, contributing to issues like persistent hallucination rates.

Key takeaway

For AI scientists and policy makers developing or regulating LLMs, you must move beyond aggregated binary preference models. Recognize that "universal" alignment is an illusion; instead, prioritize transparent, participatory methods that account for diverse, even conflicting, user values. Your efforts should focus on enabling personalized AI experiences and establishing regulatory oversight to prevent algorithmic erasure of minority perspectives, ensuring more equitable and effective AI systems.

Key insights

Human preferences for AI are pluralistic and contextual, challenging current singular alignment methods.

Principles

RLHF aggregates conflicting preferences, leading to "epistemic violence."
Even shared values like "truthfulness" have diverse, incompatible definitions.
Controversial AI features are often flattened by binary preference models.

Method

The study used mixed-methods analysis, combining qualitative coding and regression analysis on 1,500 open-ended survey responses from the PRISM dataset to identify nuanced AI preferences.

In practice

Implement participatory methods for AI alignment principle identification.
Consider personalizing LLM outputs to reflect divergent user preferences.
Design elicitation formats that capture contextual distinctions.

Topics

AI Alignment
Reinforcement Learning from Human Feedback
LLM Preferences
Preference Plurality
AI Ethics
PRISM Dataset

Code references

vectara/hallucination-leaderboard

Best for: Research Scientist, AI Scientist, AI Ethicist, Policy Maker

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.