Geometry Without Coordinates
Summary
This article introduces kernels, affinities, and graphs as geometric tools for machine learning when objects' structure is relational rather than coordinate-based. It distinguishes between distance and similarity functions, emphasizing that similarity, often task-relevant, asks what objects have in common. Kernels are presented as generalized inner products, computing alignment in a hidden, possibly high-dimensional, feature space without explicit transformation, exemplified by linear, polynomial, and Gaussian (RBF) kernels. The article explains how kernels enable working with complex geometries, such as separating non-linearly separable data, and extends this concept to learned kernels like self-attention in Transformers, which are dynamic and asymmetric. It also discusses bandwidth as a critical geometric choice, affinity matrices, neighborhood graphs, and spectral methods like Laplacian eigenvectors and diffusion maps for uncovering global structure from local relations.
Key takeaway
For machine learning engineers developing models where raw coordinate proximity is insufficient, consider adopting kernel-based or graph-based geometric approaches. Your choice of kernel, similarity function, or graph construction (e.g., k-NN, ε-neighborhood) is a fundamental modeling commitment that defines how your system perceives object relationships. Experiment with different kernels and bandwidths, treating them as geometric decisions, to uncover latent structures that improve model performance and interpretability.
Key insights
Relational geometry, defined by kernels and graphs, often reveals structure that raw coordinate-based methods obscure.
Principles
- Similarity is distinct from distance.
- Kernels define geometry in implicit feature spaces.
- Bandwidth is a geometric choice, not a mere parameter.
Method
Kernels compute inner products in a feature space via k(x, z) = ⟨φ(x), φ(z)⟩, enabling implicit transformation. Spectral methods use graph Laplacians to embed local relations into global structures.
In practice
- Use cosine similarity for directional alignment.
- Employ Gaussian kernels for smooth, localized similarity.
- Apply Nystrom approximation for scalable kernel methods.
Topics
- Coordinate-Free Geometry
- Kernel Functions
- Affinity Graphs
- Spectral Analysis
- Self-Attention Mechanism
Code references
Best for: AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Agus’s Substack.