CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves
Summary
CurveBench is a new benchmark designed to evaluate the hierarchical topological reasoning capabilities of vision-language models (VLMs) from visual input. It comprises 756 images of pairwise non-intersecting Jordan curves, categorized into "Easy" (300 images with fewer than six curves) and "Hard" (456 images, including Polygon, Topographical, Maze, and Counting configurations). Each image is annotated with a rooted tree representing the containment relations between planar regions. The task requires models to recover this full rooted containment tree. Initial evaluations show that even state-of-the-art models like Gemini 3.1 Pro achieve only 71.1% tree-generation accuracy on CurveBench-Easy and 19.1% on CurveBench-Hard. Fine-tuning open-weight VLMs using Reinforcement Learning with Verifiable Rewards (RLVR) significantly improved performance; for instance, a fine-tuned Qwen3-VL-8B model improved from 2.8% to 33.3% accuracy on CurveBench-Easy, surpassing GPT-5.4 and Claude Opus 4.5. The benchmark highlights a significant "topological gap" in current VLM capabilities.
Key takeaway
For research scientists developing or evaluating vision-language models, CurveBench provides a critical diagnostic tool to assess and improve topological reasoning. You should consider integrating RLVR-style fine-tuning with Dr.GRPO and LoRA, as demonstrated, to enhance your models' ability to extract complex hierarchical structures from visual data. The benchmark reveals that current models, even advanced ones, struggle significantly with exact topological inference, particularly on maze-like and dense curve configurations, indicating a key area for future research and development.
Key insights
VLMs struggle with exact topological reasoning from visual input, revealing a significant "topological gap."
Principles
- Topological reasoning is an algorithmic problem.
- RLVR can improve structured reasoning in VLMs.
- LoRA is effective for sparse RL signals.
Method
The CurveBench method involves structured prediction: given an image of Jordan curves, a VLM must output a rooted tree encoding region containment, evaluated by exact tree-matching and node-count accuracy, and optimized via Dr.GRPO with LoRA.
In practice
- Use CurveBench for VLM topological reasoning diagnostics.
- Apply RLVR with Dr.GRPO for structured prediction tasks.
- Employ LoRA for efficient VLM fine-tuning.
Topics
- Topological Reasoning
- Jordan Curves
- Rooted Containment Trees
- Vision-Language Models
- Reinforcement Learning with Verifiable Rewards
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.