‘Manners for machines’: how new rules could stop AI scrapers destroying the internet

· Source: Artificial intelligence (AI) – The Conversation · Field: Legal & Regulatory — Intellectual Property & Patents, Regulatory Affairs & Government Relations, Legal Technology (LegalTech) · Depth: Intermediate, short

Summary

Australians exhibit high anxiety regarding artificial intelligence, driven by concerns over misinformation, job displacement, and the uncompensated use of creative works for AI model training. AI companies routinely scrape content from various online sources, including pirated books, social media, university repositories, and news outlets, a practice previously tolerated under the "open web" ethos. However, this detente is faltering as news organizations block scrapers and creators limit content sharing. Existing copyright exceptions, like fair dealing, are inadequate for generative AI. In response, Creative Commons proposes "CC Signals," a voluntary framework allowing creators to attach machine-readable instructions to content, specifying permitted machine uses and conditions, based on principles of consent, compensation, and credit. This framework aims to provide creators more control and ensure high-quality data for AI, potentially benefiting smaller creators, despite challenges in enforcing compensation.

Key takeaway

For CTOs and VPs of Engineering evaluating data acquisition strategies, your teams should consider integrating the proposed CC Signals framework into their scraping and data ingestion pipelines. This voluntary system, akin to robots.txt, offers a standardized way to respect creator preferences for AI use, potentially mitigating future legal risks and ensuring access to higher-quality, ethically sourced data, which is crucial for reducing AI biases and improving model utility.

Key insights

CC Signals offer a voluntary, machine-readable framework for creators to manage AI access and use of their online content.

Principles

Method

The CC Signals framework allows a "declaring party" to attach machine-readable instructions to content, specifying permitted machine uses and conditions, similar to how robots.txt functions.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Ethicist, Policy Maker, Legal Professional

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial intelligence (AI) – The Conversation.