Audio Processing for Machine Learning (Part 2): Sound Power, Intensity, Loudness, and Timbre
Summary
This article, the second in a series on audio processing for machine learning, details key physical and perceptual properties of sound. It explains sound power as the total acoustic energy emitted by a source in watts (W), and sound intensity as that power distributed over an area in W/m². The human ear's perception of loudness is not directly proportional to physical intensity, spanning an extraordinary range from the threshold of hearing (10⁻¹² W/m² or 0 dB) to the threshold of pain (10 W/m² or 130 dB), necessitating the logarithmic decibel scale. Loudness, a subjective perception, also depends on frequency, duration, and individual hearing, visualized by equal-loudness contours. Finally, timbre, the unique "color" of a sound, is explored, attributed to harmonics, the ADSR envelope, and source material, being crucial for sound classification in ML.
Key takeaway
For Machine Learning Engineers developing audio applications, understanding the distinction between physical sound properties like intensity and perceptual ones like loudness and timbre is critical. You should account for the non-linear human perception of sound, especially frequency-dependent loudness (e.g., using equal-loudness contours), when designing models for speech or music. Incorporate features that capture timbre, such as MFCCs or spectral contrast, to improve sound source identification and classification accuracy.
Key insights
Human sound perception is a complex interplay between physical properties and subjective auditory processing.
Principles
- Sound intensity spans 13 orders of magnitude for human hearing.
- Loudness perception varies significantly across different frequencies.
- Timbre defines a sound source's unique acoustic character.
Method
Sound intensity level (β) is calculated using β=10 log(I/I₀) dB, where I₀ is 1×10⁻¹² W/m².
In practice
- Use decibel scale for practical sound intensity representation.
- Analyze ADSR envelope for unique sound characteristics.
- Employ MFCCs and spectral features to capture timbre.
Topics
- Audio Processing
- Sound Physics
- Psychoacoustics
- Sound Intensity
- Loudness Perception
- Timbre Analysis
- Machine Learning Features
Best for: AI Student, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.