SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution

2026-04-22 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

The Savoir framework, grounded in cooperative game theory, addresses the credit assignment problem in training socially intelligent language agents via reinforcement learning. It combines expected utility for prospective valuation, capturing an utterance's strategic potential for future dialogue trajectories, with Shapley values for fair credit distribution, ensuring axiomatic guarantees. Evaluated on the SOTOPIA benchmark, Savoir achieves new state-of-the-art performance across all settings, with its 7B model matching or exceeding proprietary models like GPT-4o and Claude-3.5-Sonnet. Notably, large reasoning models consistently underperform, suggesting social intelligence requires distinct capabilities from analytical reasoning. Human evaluations further validate that Savoir produces more strategic responses and offers better credit assignment.

Key takeaway

For AI Engineers developing socially intelligent agents, Savoir offers a robust, theoretically grounded approach to reward modeling. You should consider integrating expected utility and Shapley values into your reinforcement learning pipelines to move beyond heuristic credit assignment. This can lead to agents that exhibit more strategic and human-aligned social behaviors, even outperforming larger proprietary models on complex social tasks.

Key insights

Savoir uses game theory's Shapley values and expected utility for principled, forward-looking credit assignment in social reinforcement learning.

Principles

Expected utility captures strategic potential.
Shapley values ensure fair credit distribution.
Social intelligence differs from analytical reasoning.

Method

Savoir computes utterance-level rewards by sampling coalitions, evaluating their expected utility via Monte Carlo rollouts, and then solving a weighted linear regression using KernelSHAP to derive Shapley values for fair credit assignment.

In practice

Train reward models with 7,500 utterance-level annotations.
Use KernelSHAP for efficient Shapley value approximation.
Prioritize extreme coalition sizes in sampling.

Topics

Savoir Framework
Social Intelligence
Reinforcement Learning
Shapley Values
Expected Utility

Code references

jyyyyy0/SAVOIR

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.