Reading and Writing with Projections

· Source: Chris McCormick · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Intermediate, medium

Summary

Transformers store, retrieve, and modify data using "feature directions" and "projections" within their model space, enabling them to pack more features than available dimensions. This concept is illustrated by encoding and decoding speaker settings (bass, volume, treble) into two dimensions. Initially, axis-aligned directions are used, but neural networks typically learn arbitrary, near-orthogonal directions, leading to slight interference and recovery errors. Transformers update values by encoding adjustments as residual vectors. While packing three features into two dimensions results in significant interference, higher dimensionality, such as a 4,096-embedding, allows for greater feature density with minimal interference due to the "blessing of dimensionality." Models achieve this by learning functional groups of directions and leveraging sparsity, where only a subset of directions carries data at any given time.

Key takeaway

For AI Engineers designing or debugging Transformer architectures, understanding how models use feature directions and projections is crucial. This mechanism allows models to store more features than their embedding dimensions, but imperfect orthogonality can introduce interference. You should consider the implications of dimensionality and sparsity in feature representation to optimize model capacity and minimize data corruption during updates.

Key insights

Transformers use projections and feature directions to store and modify data, enabling more features than dimensions.

Principles

Method

Data is encoded by multiplying input values with a projection matrix, and decoded by multiplying the encoded vector with the same matrix. Adjustments are encoded as residual vectors and added to the encoded data.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Chris McCormick.