GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

GeoSym is a neuro-symbolic framework designed to improve Large Multimodal Models' (LMMs) geometric reasoning capabilities, which often suffer from visual hallucinations and a lack of precise mathematical Chain-of-Thought (CoT) data. The framework introduces the GeoSym Engine, an automated and scalable system that uses a type-conditional grammar and an analytic SymGT Solver to derive exact symbolic ground truths and generate high-precision geometric diagrams. This engine was used to create GeoSym127K, a dataset with 51K high-resolution images, 127K questions with symbolic ground truths, and 55K answer-verified CoT QA pairs, stratified by difficulty. Additionally, GeoSym-Bench, an expert-curated benchmark of 511 complex samples, was developed for rigorous evaluation. Supervised fine-tuning (SFT) with GeoSym data significantly improved performance on diagram-dependent and multi-step geometry tasks, with a Qwen3-VL-8B model gaining +22.21% on MathVerse Vision-Only and reaching 61.52% (+6.19%) on WeMath. Reinforcement Learning with Verifiable Rewards (RLVR) further elevated performance, demonstrating the robust scaling potential of this verifiable reasoning synthesis.

Key takeaway

For research scientists developing or evaluating LMMs for mathematical reasoning, GeoSym demonstrates that integrating symbolically verifiable synthesis with structured fine-tuning and reinforcement learning is crucial. You should prioritize data generation pipelines that ensure mathematical exactness and visual-symbolic alignment to mitigate hallucinations and improve multi-step logical coherence. Consider adopting a neuro-symbolic approach to create robust, high-fidelity datasets and training paradigms for complex geometric problem-solving.

Key insights

GeoSym enhances LMM geometric reasoning through a neuro-symbolic framework generating verifiable, high-precision multimodal data.

Principles

Method

GeoSym uses a 4-stage pipeline: Builder for topological evolution, Drawer for visual grounding via Connected Component Analysis, SymGT Solver for analytic derivations, and Generator for MLLM-generated CoT verification via algebraic equivalence.

In practice

Topics

Code references

Best for: Research Scientist, Computer Vision Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.