Introducing the New Browser Automation Tool with Toolboxes in Foundry

· Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Intermediate, short

Summary

The new Browser Automation Tool with Toolboxes in Foundry enhances AI agents' ability to interact with web interfaces, bridging the gap where reasoning alone falls short. Available as an MCP tool, it leverages Playwright workspaces for robust headless browser infrastructure, offering faster automation and real-time visibility. Key updates include its availability as an MCP-native tool in Toolboxes (Public Preview) for hosted agents, and the generally available Playwright workspaces as its execution model. New features like Live View (Public Preview) enable real-time issue detection for selector drift and navigation failures, while Take Control (Public Preview) allows human intervention for non-deterministic paths such as CAPTCHAs. It also supports private website browsing (Private Preview) for internal portals and authenticated flows, integrates observability in Foundry Control Plane, and allows choice of open-source reasoning layers. This tool facilitates end-to-end agent workflows, enterprise-grade systems, and human-in-the-loop experiences for tasks like form filling and web research.

Key takeaway

For AI Engineers building agentic workflows that interact with web interfaces, the updated Browser Automation Tool in Foundry offers critical capabilities. You can now reliably automate complex web tasks, including private sites, by integrating Playwright-powered agents with human-in-the-loop controls. Utilize Live View for real-time debugging and Take Control to manage unpredictable UI elements like CAPTCHAs, ensuring robust, enterprise-grade automation. Evaluate this tool to enhance your agents' ability to complete end-to-end workflows beyond API limitations.

Key insights

The Browser Automation Tool extends AI agents' web interaction capabilities with human-in-the-loop controls and robust infrastructure.

Principles

Method

Define browser workflow in Hosted Agent setup, run execution with Live View, detect and resolve issues, use Take Control on complex branches, then continue automation and capture outcomes.

In practice

Topics

Best for: AI Architect, AI Product Manager, AI Engineer, MLOps Engineer, Automation Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.