CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

CurveBench is a new benchmark designed to evaluate the hierarchical topological reasoning capabilities of vision-language models (VLMs) from visual input. It comprises 756 images of pairwise non-intersecting Jordan curves, categorized into "Easy" (300 images with fewer than six curves) and "Hard" (456 images, including Polygon, Topographical, Maze, and Counting configurations). Each image is annotated with a rooted tree representing the containment relations between planar regions. The task requires models to recover this full rooted containment tree. Initial evaluations show that even state-of-the-art models like Gemini 3.1 Pro achieve only 71.1% tree-generation accuracy on CurveBench-Easy and 19.1% on CurveBench-Hard. Fine-tuning open-weight VLMs using Reinforcement Learning with Verifiable Rewards (RLVR) significantly improved performance; for instance, a fine-tuned Qwen3-VL-8B model improved from 2.8% to 33.3% accuracy on CurveBench-Easy, surpassing GPT-5.4 and Claude Opus 4.5. The benchmark highlights a significant "topological gap" in current VLM capabilities.

Key takeaway

For research scientists developing or evaluating vision-language models, CurveBench provides a critical diagnostic tool to assess and improve topological reasoning. You should consider integrating RLVR-style fine-tuning with Dr.GRPO and LoRA, as demonstrated, to enhance your models' ability to extract complex hierarchical structures from visual data. The benchmark reveals that current models, even advanced ones, struggle significantly with exact topological inference, particularly on maze-like and dense curve configurations, indicating a key area for future research and development.

Key insights

VLMs struggle with exact topological reasoning from visual input, revealing a significant "topological gap."

Principles

Method

The CurveBench method involves structured prediction: given an image of Jordan curves, a VLM must output a rooted tree encoding region containment, evaluated by exact tree-matching and node-count accuracy, and optimized via Dr.GRPO with LoRA.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.