The Architecture Behind Atlas: OpenAI’s New ChatGPT-based Browser

· Source: ByteByteGo Newsletter · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Advanced, medium

Summary

OpenAI launched ChatGPT Atlas, a web browser designed for AI co-piloting, which required a novel architectural approach to achieve instant startup, responsiveness with hundreds of tabs, and rich animations. Instead of embedding Chromium directly, OpenAI developed OWL (OpenAI's Web Layer), an architecture that runs Chromium as a separate process. This separation allows the Atlas UI, built with SwiftUI and Metal, to communicate with the Chromium host via Mojo, using custom Swift and TypeScript bindings. OWL addresses complex challenges like cross-process rendering using macOS CALayerHost API for efficient GPU memory sharing and handles input events by translating NSEvents to WebInputEvents. This design facilitates features like Agent mode, which composites pop-up UI elements into single screenshots for AI input and uses isolated, in-memory storage partitions for ephemeral sessions, ensuring security and data privacy.

Key takeaway

For AI Architects and Software Engineers building complex applications with embedded web technologies, consider adopting a decoupled architecture like OpenAI's OWL. This approach allows for faster UI startup, enhanced stability through process isolation, and greater flexibility for integrating advanced features like AI agents, while minimizing the overhead of maintaining custom patches against upstream web engines. Your team can achieve higher developer productivity by abstracting the web engine into a prebuilt binary.

Key insights

Decoupling the browser UI from the web engine enables advanced AI-powered browsing features and developer agility.

Principles

Method

Run Chromium as a separate process (OWL Host) from the main application (OWL Client), communicating via Mojo IPC. Render content using CALayerHost and translate input events for cross-process display and interaction.

In practice

Topics

Best for: Software Engineer, AI Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by ByteByteGo Newsletter.