What Do People Actually Want From AI? Mapping Preference Plurality

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Social Sciences & Behavioral Studies · Depth: Expert, quick

Summary

Large Language Models (LLMs) are often fine-tuned using Reinforcement Learning from Human Feedback (RLHF) to align with user preferences, but this method has significant limitations. An analysis of 1,500 open-ended responses from the PRISM dataset across 75 countries reveals that people's preferences are highly diverse; most values are requested by fewer than a quarter of respondents, with truthfulness being the sole exception at 49%. Crucially, the term "truthfulness" itself hides divergent meanings, encompassing requests for sourced claims, expert opinions, or even unpopular views. Furthermore, capabilities like human-like model behavior and features such as AI guardrails are controversial. The study also found that users make contextual distinctions (e.g., "by default" versus "if requested") that binary comparisons fail to capture, exposing fundamental problems in current alignment practices and explaining persistent hallucination rates.

Key takeaway

For AI Scientists and Ethicists developing alignment strategies, you must move beyond aggregated, binary preference models. Your current RLHF methods likely flatten diverse user values and misinterpret terms like "truthfulness." You should design systems that accommodate preference plurality and contextual distinctions (default vs. requested) to genuinely address user needs and reduce issues like persistent hallucinations.

Key insights

People's AI preferences are diverse, context-dependent, and often contradictory, challenging current single-model alignment methods.

Principles

In practice

Topics

Best for: Research Scientist, AI Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.