Introducing OS Level Actions in Amazon Bedrock AgentCore Browser

2026-05-05 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Intermediate, medium

Summary

Amazon Bedrock AgentCore Browser has introduced OS Level Actions, a new capability that allows AI agents to interact with operating system-level UI elements beyond the traditional web layer. Previously, agents operating within the browser's DOM (via Playwright or CDP) could not interact with native dialogs, security prompts, certificate choosers, or context menus. This update exposes direct OS control through the `InvokeBrowser` API, enabling agents to observe, reason about, and act on full-desktop screenshots using mouse and keyboard controls. The system supports eight actions across mouse control (click, move, drag, scroll), keyboard input (type, press, shortcut), and visual capture (screenshot). This functionality is crucial for handling scenarios like dismissing print dialogs or responding to OS prompts that frequently appear in production environments, closing a significant gap in browser automation coverage.

Key takeaway

For AI Engineers building web automation agents, OS Level Actions in Amazon Bedrock AgentCore Browser eliminate critical blockers posed by native OS UI. You should integrate the `InvokeBrowser` API to handle scenarios like security prompts, print dialogs, and keyboard shortcuts, ensuring your agents can complete complex workflows that extend beyond the browser's DOM. This capability allows for more robust and comprehensive automation, reducing manual intervention in production environments.

Key insights

OS Level Actions in AgentCore Browser enable AI agents to interact with native OS UI elements.

Principles

Web automation has a hard boundary at the DOM.
Vision models require actionable targets.
An action-screenshot-reaction loop enables dynamic UI interaction.

Method

Agents dispatch OS-level actions via `InvokeBrowser` API, capture full-desktop screenshots, send them to a vision model for reasoning, and then execute the next action based on the model's output.

In practice

Use `mouseClick` for native dialog buttons.
Employ `keyShortcut` for OS-level key combinations.
Capture full desktop with `screenshot` for vision model input.

Topics

Amazon Bedrock AgentCore Browser
OS Level Actions
Browser Automation
InvokeBrowser API
Action-Screenshot-Reaction Loop

Code references

awslabs/agentcore-samples

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.