ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm
Summary
ComAct introduces a novel "COM-as-Action" paradigm to address limitations in professional software manipulation by computer-use agents. Existing GUI-based agents suffer from fragile visual grounding and error accumulation, while API-based approaches struggle with heterogeneous protocols and inaccessible commercial interfaces. ComAct reframes software interaction as deterministic program synthesis using the Component Object Model (COM) as a unified executable abstraction. To validate this, the authors developed ComCADBench, the first benchmark for agents operating real industrial CAD software. Experiments on ComCADBench reveal a significant paradigm gap: frontier proprietary models achieve near-zero success with GUI-based interaction, whereas COM-based execution yields substantial immediate gains. The work also presents ComActor, a self-correcting agent trained through a three-stage framework, and ComForge, a scalable training platform using Windows containers. ComActor achieves state-of-the-art performance on ComCADBench, demonstrating strong resilience in long-horizon tasks and generalizing to external CAD benchmarks.
Key takeaway
For AI Engineers developing agents for professional software, especially industrial CAD, you should prioritize COM-based interaction over traditional GUI or API methods. The "COM-as-Action" paradigm demonstrates superior resilience and success rates, particularly for long-horizon tasks where visual grounding fails. Consider integrating self-correcting mechanisms like ComActor's three-stage framework and scalable training platforms such as ComForge to achieve robust, state-of-the-art automation.
Key insights
The COM-as-Action paradigm reframes software interaction as deterministic program synthesis, enabling robust manipulation of professional applications where GUI-based agents fail.
Principles
- COM offers a unified executable abstraction.
- Deterministic program synthesis outperforms visual control.
- COM-based execution yields substantial performance gains.
Method
ComActor is a self-correcting agent trained via a progressive three-stage framework, leveraging ComForge for scalable training in Windows containers to bridge syntactic and geometric accuracy gaps.
In practice
- Automate industrial CAD via COM-based agents.
- Implement self-correction for long-horizon tasks.
- Utilize Windows containers for scalable training.
Topics
- Component Object Model
- Program Synthesis
- Software Agents
- Industrial CAD
- ComCADBench
- ComActor
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.