The emergence of the web data infrastructure layer for AI

· Source: MIT Technology Review · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Data Science & Analytics · Depth: Intermediate, medium

Summary

The emergence of a new web data infrastructure layer is critical for AI, as current models struggle with the dynamic, unstructured nature of web data. The web was not designed for the automated discovery and retrieval AI applications demand, leading to issues like AI hallucinations and project abandonment; Gartner reports 60% of AI projects without "AI-ready" data will fail. Bright Data's CEO, Or Lenchner, highlights the need for infrastructure that can mimic human browsing behavior at scale, navigating hundreds of millions of domains and billions of new URLs weekly, delivering real-time information while overcoming technical barriers. This specialized infrastructure, which can emulate a web user with 1,000+ parameters 80 billion times a day, is essential for applications like dynamic pricing and trademark tracking. A survey found 56% of AI practitioners need real-time web data to improve trust, and 97% of AI organizations depend on such infrastructure, though 90% feel restricted.

Key takeaway

For AI Architects or MLOps Engineers building AI systems requiring current, reliable data, you must prioritize specialized web data infrastructure. Your models need real-time, trustworthy information to avoid stale answers and reduce hallucinations. Invest in platforms designed for large-scale, low-latency data retrieval and orchestration, ensuring compliance with privacy frameworks like GDPR and CCPA. This commitment will position your organization to build more responsive and reliable AI systems.

Key insights

Specialized web data infrastructure is critical for AI to access real-time, trustworthy, and contextually relevant information at scale.

Principles

Method

A web data infrastructure platform emulates human browsing behavior, accessing content from JavaScript-heavy sites and those with antibot software, mimicking a web user with identifying information (IP, location, 1,000+ parameters) at scale.

In practice

Topics

Best for: CTO, VP of Engineering/Data, AI Architect, MLOps Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by MIT Technology Review.