MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

2025-08-08 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

MoReBench is a new benchmark designed to evaluate procedural and pluralistic moral reasoning in language models, focusing on the "how" rather than just the "what" of decisions. It comprises 1,000 moral scenarios, each with expert-written rubric criteria totaling over 23,000, covering aspects like identifying moral considerations, weighing trade-offs, and providing actionable recommendations. A companion dataset, MoReBench-Theory, includes 150 examples testing reasoning under five major normative ethical frameworks. Key findings indicate that traditional scaling laws and existing benchmarks (math, code, scientific reasoning) fail to predict models' moral reasoning abilities. Furthermore, models exhibit partiality towards specific moral frameworks, such as Benthamite Act Utilitarianism and Kantian Deontology, and generally struggle with logical reasoning in moral contexts.

Key takeaway

For AI Ethicists and Machine Learning Engineers developing value-aligned systems, this research highlights that current LLM benchmarks are insufficient for assessing true moral reasoning. You should integrate process-focused evaluation methodologies, like MoReBench, to uncover how models arrive at decisions, not just their final outcomes. This approach is crucial for identifying and mitigating inherent biases towards specific ethical frameworks and improving logical reasoning in complex moral dilemmas, ultimately leading to safer and more transparent AI.

Key insights

MoReBench evaluates LLM moral reasoning processes using expert-defined rubrics, revealing shortcomings unpredicted by standard benchmarks.

Principles

Moral reasoning in LLMs requires process-focused evaluation.
LLM moral reasoning ability does not scale predictably with size.
Models show inherent biases towards specific ethical frameworks.

Method

MoReBench uses 1,000 moral scenarios with 23,018 expert-written rubric criteria to score LLM thinking traces and final responses, including a length-corrected metric. MoReBench-Theory tests five normative ethics frameworks.

In practice

Integrate process-focused evaluation for AI safety and alignment.
Scrutinize LLM outputs for logical reasoning gaps in moral dilemmas.
Customize LLM training to address framework-specific moral biases.

Topics

Moral Reasoning
Language Model Evaluation
AI Ethics
Benchmarking
Procedural Reasoning
Ethical Frameworks

Code references

morebench/morebench

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.