You Can Fit a Million Nearly-Perpendicular Arrows in 768 Dimensions.
Summary
The article explains the fundamental reason behind the effectiveness of word embeddings: the ability of high-dimensional spaces to accommodate a vast number of "nearly-perpendicular" vectors. While human intuition, trained in three dimensions, limits truly perpendicular arrows to three, models operating in spaces like 768 dimensions can fit millions of arrows that are very close to right angles to each other. This counter-intuitive property allows models to represent far more distinct ideas than their dimensional count suggests. The author plans to demonstrate this concept through arithmetic, starting with small vector examples and gradually increasing dimensions (2, 4, 100, then 768), ultimately showing how this "absurd fact" underpins every word embedding.
Key takeaway
For Machine Learning Engineers designing or debugging embedding-based systems, understanding that high-dimensional spaces can accommodate millions of nearly-perpendicular vectors is crucial. Your intuition from 3D space will mislead you about the true capacity of embeddings. This insight clarifies why models can represent a vast array of concepts, informing decisions on embedding dimension selection and the interpretability of vector space operations.
Key insights
High-dimensional spaces enable millions of nearly-perpendicular vectors, explaining why embeddings effectively represent numerous distinct ideas.
Principles
- High-dimensional geometry defies 3D intuition.
- Embeddings leverage near-orthogonality for capacity.
- Vector perpendicularity increases with dimensions.
Method
The article outlines a method to compute and observe the perpendicularity of random arrows in increasing dimensions (2, 4, 100, 768) to demonstrate embedding capacity.
Topics
- High-Dimensional Geometry
- Word Embeddings
- Vector Orthogonality
- Embedding Capacity
- Machine Learning Theory
Best for: AI Scientist, Machine Learning Engineer, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI on Medium.