Measuring the entropy of English

2026-06-12 · Source: 3Blue1Brown · Field: Science & Research — Mathematics & Computational Sciences, Research Methodology & Innovation · Depth: Novice, quick

Summary

Claude Shannon conducted multiple experiments to understand the probability and compressibility of English text. His initial attempts involved tracking character sequence statistics, which proved limited for longer, more contextual sequences. He then devised the "Betty Experiment," where his wife predicted letters in a passage; correct guesses were replaced with dashes, aiming to capture essential information. In his 1950 paper, "Prediction and Entropy of Printed English," Shannon refined this by recording the number of guesses human participants needed to identify the correct next letter. He combined these human prediction statistics with short sequence analysis to estimate the implicit probabilities people assigned. Shannon's final estimate indicated that English, with at least 100 characters of context, could be compressed to approximately 1 bit per character, highlighting the necessity of probing intelligence beyond mere data analysis.

Key takeaway

For AI Scientists designing advanced language models, Shannon's pioneering work underscores that achieving optimal compression, around 1 bit per character, necessitates moving beyond pure statistical analysis. Your models must effectively "engineer intelligence" to capture the nuanced, context-dependent predictability of language, mirroring human cognitive processes. This implies focusing on sophisticated contextual understanding rather than merely increasing data volume for statistical inference.

Key insights

Shannon's entropy estimation for English revealed that true compressibility requires understanding human prediction, not just statistical analysis.

Principles

Language compressibility links to predictability.
Longer contexts enhance text predictability.
Human prediction models implicit language probabilities.

Method

Shannon's 1950 method involved human participants guessing the next letter, recording guess counts, and combining this with short sequence statistics to estimate implicit probabilities for English text.

Topics

Claude Shannon
Information Theory
English Language Entropy
Text Compression
Language Predictability
Human Prediction

Best for: AI Scientist, AI Student, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by 3Blue1Brown.