The Essence of LLM: Function

2026-05-16 · Source: AI on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Novice, medium

Summary

An LLM is fundamentally a mathematical function that takes a sequence of tokens as input and outputs a probability distribution over the vocabulary. This function operates within a d-dimensional space, where each token is mapped to a vector via Embedding, with dimensions like 4096 or 8192. During training, semantically similar words are positioned closer together in this space. The Attention mechanism dynamically adjusts each token's representation based on context, using Query, Key, and Value vectors to compute a learnable, dynamic weighted sum. Multi-Head Attention runs multiple such operations in parallel, learning different patterns. Feed-Forward Networks (FFNs) within Transformer blocks store the model's "facts" or knowledge. The entire training process is driven by a single objective: Next Token Prediction, where the model learns to predict the subsequent token, implicitly acquiring grammar, semantics, logic, and world knowledge through this task. The apparent intelligence of LLMs emerges from this function's repeated, autoregressive invocation.

Key takeaway

For AI Students and Software Engineers seeking to demystify LLMs, understanding them as deterministic mathematical functions is crucial. This perspective helps you interpret model behavior, optimize prompt engineering by conceptualizing it as input vector adjustment, and grasp the fundamental constraints like context window limits. Embrace this functional view to move beyond treating LLMs as opaque "black boxes" and instead push their capabilities more effectively.

Key insights

An LLM is a mathematical function mapping token sequences to probability distributions, driving all its emergent behaviors.

Principles

A word's "meaning" is its position in high-dimensional space.
Attention is a learnable, dynamic weighted sum.
Next token prediction is the ultimate compression of language understanding.

Method

LLM training involves mapping tokens to d-dimensional vectors (Embedding), dynamically adjusting representations via Attention, storing knowledge in FFNs, and optimizing for next token prediction.

In practice

View LLM errors as function misfits, not "AI unreliability."
Prompt Engineering adjusts input vectors for better function fit.
Context window limits stem from Attention's O(n²) complexity.

Topics

Large Language Models
Token Embedding
Attention Mechanism
Transformer Architecture
Feed-Forward Networks

Best for: AI Student, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI on Medium.