ContractSkill: Repairable Contract-Based Skills for Multimodal Web Agents
Summary
ContractSkill is a novel framework designed to enhance the reliability and reusability of self-generated skills for multimodal web agents. It addresses the brittleness of on-demand generated skills by converting them into "contracted executable artifacts" with explicit preconditions, step specifications, postconditions, recovery rules, and termination checks. This structured representation enables deterministic verification, precise fault localization at the step level, and minimal patch-based repair, shifting skill refinement from full regeneration to localized editing. Experiments on VisualWebArena and MiniWoB benchmarks using GLM-4.6V and Qwen3.5-Plus models demonstrate significant improvements in success rates. On VisualWebArena, success rates increased from 9.4% and 10.9% to 28.1% and 37.5% for GLM-4.6V and Qwen3.5-Plus, respectively. On MiniWoB, rates rose from 66.5% and 60.5% to 77.5% and 81.0%. Crucially, repaired artifacts also exhibit cross-model transferability, boosting target models' baseline performance by up to 47.8 points on VisualWebArena and 12.8 points on MiniWoB.
Key takeaway
For AI Architects and AI Engineers developing multimodal web agents, adopting a contract-based skill framework like ContractSkill is crucial for overcoming the brittleness of self-generated skills. This approach transforms fragile, one-off prompts into robust, verifiable, and repairable procedural assets, significantly improving task success rates and enabling cross-model skill transfer. You should prioritize externalizing skills with explicit execution semantics to build more reliable and maintainable agent systems.
Key insights
Explicitly structured, verifiable, and repairable skill artifacts significantly improve multimodal web agent performance and reusability.
Principles
- Skills should be explicit procedural artifacts.
- Deterministic verification enables precise fault localization.
- Minimal patch-based repair is more stable than full regeneration.
Method
ContractSkill compiles draft skills into structured artifacts with contracts, uses a deterministic verifier for step-level fault localization, and applies minimal patch operators within an iterative execute-diagnose-patch-validate loop.
In practice
- Implement explicit preconditions and postconditions for agent steps.
- Design deterministic verifiers for observable page states.
- Favor minimal, localized edits over full skill rewrites.
Topics
- ContractSkill Framework
- Multimodal Web Agents
- Skill Repair
- Deterministic Verification
- Fault Localization
Best for: AI Architect, AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.