Mental-R1: Aligning LLM Reasoning for Mental Health Assessment
Summary
Mental-R1, a model trained with Cognitive Relative Policy Optimization (CRPO), addresses the challenge of aligning large language model (LLM) reasoning with human cognitive processes for mental health assessment. Developed by researchers at the University of Oxford, CRPO is a reinforcement learning framework that extends group relative policy optimization by integrating stage-dependent uncertainty modeling. It introduces a stage-wise entropy regularization mechanism, encouraging broad exploration in early reasoning and confident decision-making later, mimicking human cognition. Inspired by cognitive appraisal theory, CRPO formalizes reasoning into stages like stimulus, primary appraisal, secondary appraisal, reaction, and mental state. Experiments across 8 mental health datasets demonstrate that CRPO achieves an average improvement of 10.4 percentage points in weighted F1-score over the best reinforcement learning baseline, showing clear advantages for reasoning-intensive cases.
Key takeaway
For Machine Learning Engineers developing LLMs for mental health assessment, consider adopting cognition-aligned reinforcement learning. CRPO's stage-wise entropy regularization and theory-grounded reasoning stages, like those inspired by cognitive appraisal theory, significantly enhance model reliability and F1-score by 10.4 percentage points. Implement structured reasoning paths using XML-like tags and balanced rewards to improve accuracy and interpretability, especially for complex, reasoning-intensive cases.
Key insights
Aligning LLM reasoning with human cognitive processes significantly improves mental health assessment accuracy.
Principles
- Human cognition transitions from uncertainty to certainty during assessment.
- Mental health assessment benefits from theory-grounded cognitive stages.
Method
CRPO extends GRPO by integrating stage-dependent entropy regularization and formalizing cognitive appraisal theory stages (Stimulus, Primary Appraisal, Secondary Appraisal, Reaction, Mental State) with tailored rewards.
In practice
- Use XML-like tags to enforce theory-grounded reasoning stages.
- Apply balanced rewards for class and dataset imbalance in RL training.
Topics
- Mental Health Assessment
- Large Language Models
- Reinforcement Learning
- Cognitive Appraisal Theory
- F1-score
- Stage-wise Entropy Regularization
- Mental-R1
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.