Social Norm Reasoning in Multimodal Language Models: An Evaluation

· Source: cs.MA updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, extended

Summary

A new evaluation framework assesses the social norm reasoning capabilities of five Multimodal Large Language Models (MLLMs): GPT-4o, Gemini 2.0 Flash, Qwen-2.5VL (72B), Intern-VL3 (14B), and Meta LLaMa-4 Maverick. Researchers from the University of Otago evaluated these models on 30 text-based and 30 image-based stories, each depicting one of five social norms across six variants of adherence or violation, and compared their responses to human ground truth. The study found that MLLMs performed significantly better in text-based norm reasoning (average 95.33% accuracy) than in image-based reasoning (average 83.58% accuracy). GPT-4o consistently achieved the highest accuracy in both modalities (98.75% text, 92.5% image), followed by the free model Qwen-2.5VL (97.5% text, 85.41% image). All models struggled with complex "metanorms" involving multiple layers of reasoning.

Key takeaway

For AI Scientists developing socially intelligent agents, this research indicates that current MLLMs, particularly GPT-4o, offer robust norm reasoning from text but show reduced accuracy with visual inputs. You should design systems that either prioritize textual context for norm interpretation or incorporate advanced visual processing to handle complex social cues. Be aware that reasoning about metanorms remains a significant challenge, requiring further research or explicit rule encoding for critical applications.

Key insights

MLLMs demonstrate stronger social norm reasoning in text than images, with GPT-4o leading performance.

Principles

Method

Evaluated MLLMs using 30 text and 30 image stories across five norms and six adherence/violation variants, comparing eight question responses against human ground truth to assess norm reasoning.

In practice

Topics

Best for: AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.MA updates on arXiv.org.