Audio Processing for Machine Learning (Part 2): Sound Power, Intensity, Loudness, and Timbre

2026-06-22 · Source: Machine Learning on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, medium

Summary

This article, the second in a series on audio processing for machine learning, details key physical and perceptual properties of sound. It explains sound power as the total acoustic energy emitted by a source in watts (W), and sound intensity as that power distributed over an area in W/m². The human ear's perception of loudness is not directly proportional to physical intensity, spanning an extraordinary range from the threshold of hearing (10⁻¹² W/m² or 0 dB) to the threshold of pain (10 W/m² or 130 dB), necessitating the logarithmic decibel scale. Loudness, a subjective perception, also depends on frequency, duration, and individual hearing, visualized by equal-loudness contours. Finally, timbre, the unique "color" of a sound, is explored, attributed to harmonics, the ADSR envelope, and source material, being crucial for sound classification in ML.

Key takeaway

For Machine Learning Engineers developing audio applications, understanding the distinction between physical sound properties like intensity and perceptual ones like loudness and timbre is critical. You should account for the non-linear human perception of sound, especially frequency-dependent loudness (e.g., using equal-loudness contours), when designing models for speech or music. Incorporate features that capture timbre, such as MFCCs or spectral contrast, to improve sound source identification and classification accuracy.

Key insights

Human sound perception is a complex interplay between physical properties and subjective auditory processing.

Principles

Sound intensity spans 13 orders of magnitude for human hearing.
Loudness perception varies significantly across different frequencies.
Timbre defines a sound source's unique acoustic character.

Method

Sound intensity level (β) is calculated using β=10 log⁡(I/I₀) dB, where I₀ is 1×10⁻¹² W/m².

In practice

Use decibel scale for practical sound intensity representation.
Analyze ADSR envelope for unique sound characteristics.
Employ MFCCs and spectral features to capture timbre.

Topics

Audio Processing
Sound Physics
Psychoacoustics
Sound Intensity
Loudness Perception
Timbre Analysis
Machine Learning Features

Best for: AI Student, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.