L2 Distance was Giving Me Wrong Answers. Here’s the Metric That Fixed it.

2026-05-16 · Source: Towards AI - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Advanced, medium

Summary

The article addresses the limitations of L2 distance for comparing complex data distributions, specifically in the context of audio fingerprints for OmniPulse. It explains that L2 distance, which measures "are these numbers similar?", is inadequate for comparing distributions of energy across wavelet scales and time, as it fails to account for structural differences. The author introduces Wasserstein distance, particularly Sliced-Wasserstein (SW₁), as a superior metric for measuring the "work" required to transform one distribution into another. Sliced-Wasserstein overcomes the O(N³) computational complexity of exact Wasserstein by projecting high-dimensional data onto random 1D directions, computing 1D Wasserstein distance (O(N log N)), and averaging the results. This technique achieves a practical O(L × N log N) complexity, making it viable for large datasets. The implementation details for a Rust library `sliced-wasserstein` are provided, along with correctness guarantees and real-world test results demonstrating its ability to capture physically coherent signal structure in audio fingerprints.

Key takeaway

For AI Scientists and Research Scientists working with data that represents distributions or point clouds, such as audio fingerprints, LiDAR scans, or document embeddings, you should consider adopting Sliced-Wasserstein distance instead of L2. This metric provides a geometrically correct measure of similarity, ensuring that your models capture meaningful structural differences and leading to more accurate retrieval and analysis, as demonstrated by its application in OmniPulse's HNSW index.

Key insights

Sliced-Wasserstein distance effectively compares complex data distributions by measuring transformation work, outperforming L2 distance.

Principles

L2 distance is insufficient for comparing structural similarity in distributions.
Wasserstein distance quantifies the "work" to transform one distribution into another.
Slicing enables efficient approximation of high-dimensional Wasserstein distance.

Method

Sliced-Wasserstein projects high-dimensional distributions onto multiple random 1D lines, computes 1D Wasserstein distance for each projection, and averages these distances to estimate the true Wasserstein distance.

In practice

Use `sliced-wasserstein` crate for distribution comparisons.
Configure `n_projections` for accuracy vs. speed trade-off.
Set `seed` for deterministic distance calculations.

Topics

L2 Distance Limitations
Wasserstein Distance
Sliced-Wasserstein
Audio Fingerprints
Wavelet Scattering Transform

Best for: AI Scientist, Research Scientist, Machine Learning Engineer, AI Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.