ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

ComAct introduces a novel "COM-as-Action" paradigm to address limitations in professional software manipulation by computer-use agents. Existing GUI-based agents suffer from fragile visual grounding and error accumulation, while API-based approaches struggle with heterogeneous protocols and inaccessible commercial interfaces. ComAct reframes software interaction as deterministic program synthesis using the Component Object Model (COM) as a unified executable abstraction. To validate this, the authors developed ComCADBench, the first benchmark for agents operating real industrial CAD software. Experiments on ComCADBench reveal a significant paradigm gap: frontier proprietary models achieve near-zero success with GUI-based interaction, whereas COM-based execution yields substantial immediate gains. The work also presents ComActor, a self-correcting agent trained through a three-stage framework, and ComForge, a scalable training platform using Windows containers. ComActor achieves state-of-the-art performance on ComCADBench, demonstrating strong resilience in long-horizon tasks and generalizing to external CAD benchmarks.

Key takeaway

For AI Engineers developing agents for professional software, especially industrial CAD, you should prioritize COM-based interaction over traditional GUI or API methods. The "COM-as-Action" paradigm demonstrates superior resilience and success rates, particularly for long-horizon tasks where visual grounding fails. Consider integrating self-correcting mechanisms like ComActor's three-stage framework and scalable training platforms such as ComForge to achieve robust, state-of-the-art automation.

Key insights

The COM-as-Action paradigm reframes software interaction as deterministic program synthesis, enabling robust manipulation of professional applications where GUI-based agents fail.

Principles

Method

ComActor is a self-correcting agent trained via a progressive three-stage framework, leveraging ComForge for scalable training in Windows containers to bridge syntactic and geometric accuracy gaps.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.