Fast algorithms for learning a Gaussian under halfspace truncation with optimal sample complexity

2026-06-25 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

A new efficient algorithm addresses the fundamental problem of learning a high-dimensional Gaussian truncated to an unknown halfspace. This work improves upon previous polynomial-time algorithms, such as Lee, Mehrotra and Zampetakis (FOCS'24), which had sub-optimal sample and time complexity. The new algorithm achieves optimal sample complexity of n = Õ(d^2/ε^2) and learns the underlying Gaussian to an error ε in total variation distance. Its runtime is dominated by empirical covariance matrix computation, making both sample and time complexity optimal even without truncation. The core innovation is a novel reinterpretation of low-degree moments of the truncated Gaussian using a relative truncation parameter, which enables direct parameter recovery and bypasses time-intensive projected stochastic gradient descent.

Key takeaway

For Machine Learning Engineers and AI Scientists working with high-dimensional data subject to implicit selection biases or halfspace truncation, you should consider this new algorithm. It offers optimal sample complexity of Õ(d^2/ε^2) and significantly faster runtime by avoiding projected stochastic gradient descent. This allows you to learn underlying Gaussian distributions more efficiently and accurately, even when data is truncated, without incurring additional complexity compared to untruncated learning.

Key insights

A novel moment reinterpretation enables optimal, fast learning of truncated high-dimensional Gaussians, effectively "for free."

Principles

Low-degree moments of truncated Gaussians can be reinterpreted via a relative truncation parameter.
Direct parameter recovery is achievable, bypassing iterative optimization.

Method

Reinterpret low-degree moments using a relative truncation parameter to directly determine untruncated Gaussian parameters, avoiding projected stochastic gradient descent.

In practice

This approach avoids the time-intensive projected stochastic gradient descent procedure.
It offers optimal sample and time complexity for truncated Gaussian learning.

Topics

Gaussian Learning
Halfspace Truncation
Optimal Sample Complexity
High-Dimensional Statistics
Moment Methods
Algorithm Efficiency

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.