MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
Summary
MoReBench is a new benchmark designed to evaluate procedural and pluralistic moral reasoning in language models, focusing on the "how" rather than just the "what" of decisions. It comprises 1,000 moral scenarios, each with expert-written rubric criteria totaling over 23,000, covering aspects like identifying moral considerations, weighing trade-offs, and providing actionable recommendations. A companion dataset, MoReBench-Theory, includes 150 examples testing reasoning under five major normative ethical frameworks. Key findings indicate that traditional scaling laws and existing benchmarks (math, code, scientific reasoning) fail to predict models' moral reasoning abilities. Furthermore, models exhibit partiality towards specific moral frameworks, such as Benthamite Act Utilitarianism and Kantian Deontology, and generally struggle with logical reasoning in moral contexts.
Key takeaway
For AI Ethicists and Machine Learning Engineers developing value-aligned systems, this research highlights that current LLM benchmarks are insufficient for assessing true moral reasoning. You should integrate process-focused evaluation methodologies, like MoReBench, to uncover how models arrive at decisions, not just their final outcomes. This approach is crucial for identifying and mitigating inherent biases towards specific ethical frameworks and improving logical reasoning in complex moral dilemmas, ultimately leading to safer and more transparent AI.
Key insights
MoReBench evaluates LLM moral reasoning processes using expert-defined rubrics, revealing shortcomings unpredicted by standard benchmarks.
Principles
- Moral reasoning in LLMs requires process-focused evaluation.
- LLM moral reasoning ability does not scale predictably with size.
- Models show inherent biases towards specific ethical frameworks.
Method
MoReBench uses 1,000 moral scenarios with 23,018 expert-written rubric criteria to score LLM thinking traces and final responses, including a length-corrected metric. MoReBench-Theory tests five normative ethics frameworks.
In practice
- Integrate process-focused evaluation for AI safety and alignment.
- Scrutinize LLM outputs for logical reasoning gaps in moral dilemmas.
- Customize LLM training to address framework-specific moral biases.
Topics
- Moral Reasoning
- Language Model Evaluation
- AI Ethics
- Benchmarking
- Procedural Reasoning
- Ethical Frameworks
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.