Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven
Summary
This research proves the efficacy of randomized Hadamard transforms (RHTs) as a fast, orthogonal alternative to uniform random rotations (URRs) in quantization schemes. While URRs ensure individual coordinates converge to a Gaussian distribution in high dimensions, a single RHT does not. The study demonstrates that composing two RHTs on a $d$-sized input vector ensures the marginal distribution of each coordinate is within $O(d^{-1/2})$ of a standard Gaussian, both in Kolmogorov and $1$-Wasserstein distances. This two-RHT composition asymptotically matches URRs in modern compression schemes like DRIVE and QUIC-FL. For Vector Quantization (VQ), which requires weak correlation across coordinate blocks, three RHTs are shown to lead to decaying coordinate covariance, ensuring similar expected error to URRs. The authors also propose an $O(d)$ runtime check to dynamically adjust the number of RHTs based on input moments.
Key takeaway
For AI Engineers optimizing model quantization, this work provides a robust, provable alternative to computationally intensive uniform random rotations. If you are implementing gradient compression or inference acceleration, using two randomized Hadamard transforms can achieve comparable performance with faster execution. For Vector Quantization, three RHTs are necessary to ensure error consistency. Consider integrating the proposed $O(d)$ runtime check to dynamically adapt RHT usage, balancing performance and computational cost for diverse inputs.
Key insights
Composing multiple randomized Hadamard transforms effectively approximates uniform random rotations for quantization.
Principles
- Two RHTs suffice for marginal Gaussian convergence.
- Three RHTs ensure decaying coordinate covariance for VQ.
Method
The proposed method involves composing two or three randomized Hadamard transforms (RHTs) to achieve statistical properties similar to uniform random rotations (URRs) for various quantization tasks, with an optional $O(d)$ runtime check to adapt the number of RHTs.
In practice
- Use two RHTs for gradient compression.
- Employ three RHTs for Vector Quantization.
- Implement dynamic RHT count based on input moments.
Topics
- Randomized Hadamard Transforms
- Uniform Random Rotations
- Quantization
- Vector Quantization
- Gradient Compression
Code references
Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.