Are LLMs Bad at Moral Reasoning?
Summary
A new analysis challenges previous pessimistic conclusions regarding large language models' (LLMs) moral reasoning capabilities, particularly those drawn from the MoReBench dataset. Earlier research benchmarked frontier AI models against 1,000 gold-standard human-authored rubrics for moral reasoning across various cases, yielding underwhelming results. This paper argues that if LLMs are tasked with generating scoring rubrics for moral analysis, rather than open-ended responses, their performance appears significantly more capable. The LLM-generated rubrics demonstrate better calibration to human-authored rubrics. Where discrepancies exist, they are attributed to the vast dimensionality of moral problems or human departures from rubric creation guidelines, suggesting LLMs possess greater moral reasoning capacity than initially believed.
Key takeaway
For AI Scientists and Ethicists designing or interpreting moral reasoning benchmarks, you should critically re-evaluate the task given to large language models. If your current evaluations rely on scoring LLM open-ended responses, consider shifting to a rubric generation task. This approach may reveal a significantly higher moral competence in LLMs, suggesting that current pessimistic conclusions might stem from methodological choices rather than inherent AI limitations. Adjusting your evaluation framework could lead to more accurate assessments.
Key insights
Re-tasking LLMs to generate moral reasoning rubrics reveals significantly higher moral competence than prior evaluations.
Principles
- Moral competence involves identifying and responding to moral reasons.
- Evaluation task design critically influences perceived AI capabilities.
- Moral problems often possess vast dimensionality.
Method
The method involves giving LLMs the task of generating scoring rubrics for moral analysis of cases, then comparing these generated rubrics against human-authored gold standards.
In practice
- Re-evaluate AI systems by altering the task design.
- Analyze discrepancies in moral reasoning as dimensionality issues.
Topics
- Large Language Models
- Moral Reasoning
- AI Ethics
- MoReBench Dataset
- Benchmark Design
- AI Evaluation
Best for: Research Scientist, AI Scientist, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.