From Shortcuts to Reasoning: Robust Post-Training of Theory of Mind with Reinforcement Learning
Summary
A new study introduces Thinking-RFT, a Reinforcement Fine-Tuning method, to address the pervasive "shortcut" issue in Theory of Mind (ToM) datasets for foundation models. Existing datasets can yield up to 99% accuracy by exploiting spurious causal correlations, creating a false sense of ToM. The researchers developed a framework to identify these shortcuts, noting that "belief" questions are more prone to them than "intention" questions. Applying Thinking-RFT, which uses verifiable rewards and explicit reasoning chains, across four shortcut-free datasets and three ToM contexts, the method achieved a 6% improvement over Supervised Fine-Tuning (SFT) overall. This included a 10% improvement in complex higher-order reasoning and 7% in multimodal cases, alongside better generalization and robustness. The study highlights that the joint effect of reasoning and RL in Thinking-RFT, grounding reasoning on anchor cues, specifically contributed to a 7% average improvement over Non-Thinking-RFT.
Key takeaway
For AI Scientists developing or fine-tuning foundation models for real-world applications requiring Theory of Mind, you should critically evaluate your ToM datasets for "shortcut" correlations, especially for "belief" questions. Implement Thinking-RFT, which combines explicit reasoning chains with reinforcement learning, to achieve more robust and generalizable ToM capabilities, particularly for higher-order reasoning and multimodal scenarios. This approach improves performance by 6-10% over SFT.
Key insights
Thinking-RFT, combining reasoning and RL, robustly improves Theory of Mind in foundation models by avoiding dataset shortcuts.
Principles
- ToM dataset accuracy can be confounded by "shortcut" issues exploiting spurious correlations.
- Questions reducible to pure state tracking are more shortcut-prone than those requiring reasoning.
- Robust ToM improvement requires grounding reasoning on causal factors like anchor cues.
Method
Thinking-RFT involves Reinforcement Fine-Tuning with verifiable rewards and explicit reasoning chains, applied to shortcut-free datasets, to elevate Theory of Mind capabilities.
In practice
- Systematically examine ToM datasets for "shortcut" issues.
- Prioritize "intention" questions over "belief" for robust ToM evaluation.
- Implement explicit reasoning chains in RL fine-tuning.
Topics
- Theory of Mind
- Reinforcement Learning
- Foundation Models
- Shortcut Learning
- Post-Training
- Supervised Fine-Tuning
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.