ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Physical Sciences & Chemistry · Depth: Expert, extended

Summary

ReactBench is a new benchmark designed to evaluate the topological reasoning capabilities of Multimodal Large Language Models (MLLMs) using complex chemical reaction diagrams. The benchmark comprises 1,618 expert-annotated question-answer pairs across four hierarchical task dimensions: spatial element localization, topological information extraction, pathway connectivity tracing, and structural topology reasoning. Evaluations across 17 MLLMs, including GPT-4o, Claude-3.5-Sonnet, and Qwen2.5-VL, reveal a significant performance gap exceeding 30% between anchor-based tasks (information extraction, pathway tracing) and holistic structural reasoning tasks (element localization, topology classification). Models achieve over 80% accuracy on anchor-based tasks but fall below 56% on holistic tasks. Controlled ablations confirm this deficit stems from reasoning limitations, not visual perception, highlighting MLLMs' inability to perform hierarchical abstraction and integrate local visual cues into coherent global structural understanding.

Key takeaway

For AI Scientists and Machine Learning Engineers developing MLLMs, this research indicates a critical need to enhance models' hierarchical abstraction capabilities. Current MLLMs excel at local information extraction but fail to integrate these cues into a coherent global structural understanding, particularly for complex topological diagrams. Your development efforts should prioritize architectural improvements that enable robust multi-hop topological reasoning over integrated visual contexts, rather than solely scaling visual encoders or improving OCR.

Key insights

MLLMs struggle with global topological reasoning on complex diagrams, despite strong local perception.

Principles

Method

ReactBench evaluates MLLMs on chemical reaction diagrams using 1,618 QA pairs across four tasks: spatial element localization, topological information extraction, pathway connectivity tracing, and structural topology reasoning.

In practice

Topics

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.