Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, medium

Summary

This research proves the efficacy of randomized Hadamard transforms (RHTs) as a fast, orthogonal alternative to uniform random rotations (URRs) in quantization schemes. While URRs ensure individual coordinates converge to a Gaussian distribution in high dimensions, a single RHT does not. The study demonstrates that composing two RHTs on a $d$-sized input vector ensures the marginal distribution of each coordinate is within $O(d^{-1/2})$ of a standard Gaussian, both in Kolmogorov and $1$-Wasserstein distances. This two-RHT composition asymptotically matches URRs in modern compression schemes like DRIVE and QUIC-FL. For Vector Quantization (VQ), which requires weak correlation across coordinate blocks, three RHTs are shown to lead to decaying coordinate covariance, ensuring similar expected error to URRs. The authors also propose an $O(d)$ runtime check to dynamically adjust the number of RHTs based on input moments.

Key takeaway

For AI Engineers optimizing model quantization, this work provides a robust, provable alternative to computationally intensive uniform random rotations. If you are implementing gradient compression or inference acceleration, using two randomized Hadamard transforms can achieve comparable performance with faster execution. For Vector Quantization, three RHTs are necessary to ensure error consistency. Consider integrating the proposed $O(d)$ runtime check to dynamically adapt RHT usage, balancing performance and computational cost for diverse inputs.

Key insights

Composing multiple randomized Hadamard transforms effectively approximates uniform random rotations for quantization.

Principles

Method

The proposed method involves composing two or three randomized Hadamard transforms (RHTs) to achieve statistical properties similar to uniform random rotations (URRs) for various quantization tasks, with an optional $O(d)$ runtime check to adapt the number of RHTs.

In practice

Topics

Code references

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.