Learning Transferable Latent User Preferences for Human-Aligned Decision Making

2026-05-15 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

CLIPR (Conversational Learning for Inferring Preferences and Reasoning) is a new framework designed to improve human-aligned decision-making in Large Language Models (LLMs) by inferring latent user preferences from minimal conversational input. LLMs often struggle with ambiguous user requests where multiple valid actions exist but only one aligns with implicit user preferences. CLIPR learns actionable, transferable natural language rules representing these preferences, which are iteratively refined through adaptive feedback. The framework was evaluated on three datasets (AmbiK, Housekeep, Mobile Manipulation) and a user study with 30 participants, demonstrating that CLIPR consistently outperforms existing methods like RLHF, in-context learning, TidyBot, GATE, and CIPHER. It achieves higher preference-aligned accuracy and significantly reduces inference costs, cutting LLM calls by up to 94% compared to baselines, while also showing robustness to poor initial rule sets and strong cross-model portability of learned rules.

Key takeaway

For AI Engineers developing human-facing LLM applications, consider integrating CLIPR to enhance user alignment and reduce operational costs. Your systems can infer complex, latent user preferences from limited interactions, translating them into explicit, transferable rules. This approach not only improves decision accuracy in ambiguous scenarios but also significantly boosts computational efficiency by reducing LLM calls, making personalization more practical and scalable.

Key insights

CLIPR learns transferable natural language rules from minimal user conversation to align LLM decisions with latent preferences.

Principles

Latent user preferences can be captured as actionable natural language rules.
Iterative refinement through adaptive feedback improves preference alignment over time.
Explicit rule sets enhance generalizability and computational efficiency.

Method

CLIPR initializes with example tasks, iteratively elicits preference dimensions via LLM-generated questions, synthesizes dialogue into explicit rules, and refines them adaptively based on performance monitoring and user feedback.

In practice

Use conversational LLMs to infer implicit user preferences.
Implement adaptive feedback loops to refine preference rules.
Prioritize explicit, transferable rule sets for cross-task generalization.

Topics

Conversational Preference Learning
Human-Aligned Decision Making
Large Language Models
Natural Language Rules
Adaptive Feedback Mechanisms

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.