DragOn: A Benchmark and Dataset for Drag-Based GUI Interactions

2026-06-04 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

DragOn is a new benchmark and training dataset designed to advance GUI agents' capabilities in complex drag-based interactions. Addressing a significant gap where drag grounding data is an order of magnitude smaller than click-grounding, DragOn focuses on four critical domains: text highlighting, cell selection, element resizing, and slider manipulation. The comprehensive dataset comprises 286K training screenshots and 3.5M training tasks, alongside a 2000-example held-out evaluation suite. Researchers evaluated several leading models, including proprietary ones like GPT and Claude, and open-weight models such as Qwen, Kimi, and Holo. A Qwen Vision-Language Model (VLM) fine-tuned on the DragOn training data demonstrated improved performance, suggesting the dataset can enhance current models' effectiveness in downstream computer-use tasks.

Key takeaway

For AI Engineers developing GUI automation agents, if your models struggle with complex drag-based interactions, consider integrating the DragOn dataset. Fine-tuning your Vision-Language Models with DragOn's 286K screenshots and 3.5M tasks can improve performance on tasks like text highlighting and element resizing. Use this benchmark to validate your agent's capabilities and drive more robust, human-like digital task automation.

Key insights

DragOn provides a crucial, large-scale dataset and benchmark to advance GUI agents' drag-based interaction capabilities.

Principles

Large-scale data is critical for complex GUI agent tasks.
Diverse drag interaction types improve model generalization.

Method

DragOn was created by collecting 286K training screenshots and 3.5M tasks across four drag-based GUI domains, then evaluating proprietary and open-weight VLMs on a 2000-example suite.

In practice

Fine-tune VLMs using the DragOn training dataset.
Benchmark GUI agents on DragOn's evaluation suite.

Topics

GUI Agents
Drag Grounding
Datasets
Benchmarking
Vision-Language Models
Human-Computer Interaction

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.