A Framework for Measuring Appropriate Reliance on Set-Valued AI Advice

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

This paper introduces the first formal framework for measuring appropriate reliance on set-valued AI advice, addressing a gap where existing methods only consider point predictions. Set-valued advice, such as discrete prediction sets for classification or continuous intervals for regression, is increasingly used to communicate AI uncertainty. Operating within the sequential judge-advisor paradigm, the framework defines distinct metrics for each task type. For classification, it uses Correct Reliance Rate on AI (CRR_AI) and Correct Reliance Rate on Self (CRR_self) to jointly characterize appropriate reliance. For regression, it introduces Quantity of AI Reliance (AIR_quant) and Quality of AI Reliance (AIR_qual), which measure how much a decision maker uses AI advice and whether that use improves their decision relative to the ground truth. This diagnostic tool helps identify specific failure modes like automation bias or algorithm aversion, which traditional accuracy or Weight of Advice (WoA) metrics cannot distinguish, thereby informing better system design and intervention evaluation.

Key takeaway

For AI Scientists and practitioners evaluating human-AI collaboration, you should move beyond traditional accuracy or Weight of Advice (WoA) metrics. Your evaluations must incorporate the proposed CRR_AI, CRR_self, AIR_quant, and AIR_qual metrics to accurately diagnose specific reliance failure modes like automation bias or algorithm aversion. This allows you to design targeted interventions and systems that foster appropriate reliance on set-valued AI advice, ensuring genuine human oversight and preventing unintended harms.

Key insights

Appropriate reliance on AI advice requires distinct metrics for set-valued predictions, separating quantity from quality of reliance.

Principles

Method

For classification, it defines CRR_AI and CRR_self based on AI informativeness. For regression, it uses AIR_quant (behavioral adjustment) and AIR_qual (error improvement) to assess reliance on interval midpoints.

In practice

Topics

Best for: AI Scientist, Research Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.