The Internet of Probability
Summary
Large language models (LLMs) are transforming the internet from a knowledge repository into an "engine of plausibility," exacerbating a pre-existing "post-truth reality" characterized by misinformation and a decline in verifiable facts. LLMs operate by computing the statistical probability of token sequences, generating authoritative-sounding but often factually ungrounded content, a phenomenon termed "careless speech" by Oxford Internet Institute researchers. This probabilistic architecture, combined with training on a "poisoned well" of internet data rife with adversarial content and misinformation, creates an "epistemic incest" feedback loop where AI-generated errors contaminate public knowledge repositories, which then retrain subsequent models. The industry's re-framing of this synthesis as "remix" further erodes provenance, making it difficult to distinguish legitimate probabilistic science from confident confabulation, ultimately risking democratic deliberation and scientific progress.
Key takeaway
For AI Ethicists and Policy Makers weighing the societal impact of LLMs, you should recognize that current AI systems fundamentally degrade the information commons by prioritizing statistical plausibility over verifiable truth and provenance. This necessitates developing and enforcing policies that mandate structural requirements for citation, attribution, and ontological grounding in AI models. Your efforts must focus on re-attaching LLM outputs to robust evidentiary infrastructures to prevent further epistemic degradation and safeguard democratic discourse.
Key insights
LLMs transform the internet into a probability machine, eroding truth and provenance through statistical generation and contaminated training data.
Principles
- LLMs compute probability, not truth.
- Bias is structural in LLMs, not an anomaly.
- Provenance is crucial for verifiable knowledge.
Method
LLMs predict the most statistically likely next token based on massive text corpora, optimizing for fluency rather than factual accuracy or engagement with language meaning, leading to "careless speech."
In practice
- Recognize LLM output as probabilistic, not factual.
- Verify AI-generated information independently.
- Prioritize provenance in knowledge systems.
Topics
- Large Language Models
- Misinformation & Disinformation
- Epistemic Crisis
- Data Provenance
- AI Bias
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Ethicist, AI Researcher, Policy Maker
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Intentional Arrangement.