PlanarBench: Evaluating LLM Spatial Reasoning via Planar Graph Drawing
Summary
PlanarBench introduces a new benchmark designed to evaluate the spatial reasoning capabilities of Large Language Models by testing their ability to draw planar graphs as ASCII art from an edge list. This task is specifically crafted to resist memorization, as edge order, orientation, and node labels are all permutable. The evaluation involved 91 models across 199 of the simplest non-isomorphic connected planar graphs, ranging from 2 to 7 vertices. A significant finding from PlanarBench is that edge count serves as the dominant predictor of task difficulty, exhibiting a strong negative correlation ($r = -0.85$), a factor not previously highlighted in other LLM graph benchmarks which typically focus solely on node count.
Key takeaway
For AI scientists and researchers developing or evaluating LLMs, PlanarBench highlights a critical gap in spatial reasoning, particularly for tasks requiring non-memorized graph generation. You should consider incorporating graph drawing challenges, specifically those with varying edge counts, into your model training and evaluation protocols. This approach will better assess true reasoning capabilities beyond simple pattern recognition, guiding the development of more robust and spatially aware LLMs.
Key insights
PlanarBench evaluates LLM spatial reasoning by testing ASCII art planar graph drawing from edge lists, resisting memorization.
Principles
- LLM spatial reasoning resists memorization.
- Edge count predicts graph drawing difficulty ($r = -0.85$).
Method
PlanarBench provides LLMs an edge list and requires them to output a planar graph as ASCII art, evaluating performance on 199 graphs with 2-7 vertices.
In practice
- Test LLMs on graph drawing tasks.
- Prioritize edge count for difficulty scaling.
Topics
- PlanarBench
- LLM Evaluation
- Spatial Reasoning
- Graph Drawing
- ASCII Art
- Benchmark Design
Best for: Research Scientist, AI Scientist, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.