SAM Audio - AI at Meta
Summary
Meta has introduced SAM Audio, a new generative separation model that allows users to accurately separate any sound from audio or audio-visual sources using simple text, visual, or span prompts. This model operates across general sound, music, and speech, enabling tasks like isolating instruments, vocals, or speech from background noise. SAM Audio is powered by a flow-matching Diffusion Transformer and functions within a DAC-VAE latent space, facilitating high-quality joint generation of target and residual audio. It achieves beyond state-of-the-art performance for all prompting capabilities and includes PE-AV, a new open-source model bringing audio capabilities to Meta's Perception Encoder. Meta also released a first-of-its-kind open-source evaluation dataset for prompted audio separation.
Key takeaway
For research scientists developing audio processing applications, SAM Audio presents a significant advancement in sound separation. You should explore integrating its multimodal prompting capabilities to enhance precision in tasks like noise reduction or speech isolation. Consider leveraging the open-source model and evaluation dataset to benchmark your own systems or accelerate development of new audio-centric features.
Key insights
SAM Audio offers state-of-the-art sound separation using multimodal prompts across diverse audio types.
Principles
- Multimodal prompting enhances audio separation.
- Generative separation models can extract target and residual stems.
- Open-source evaluation datasets drive progress.
Method
SAM Audio employs a flow-matching Diffusion Transformer within a DAC-VAE latent space to jointly generate target and residual audio from mixtures, guided by text, visual, or temporal prompts.
In practice
- Isolate specific instruments or vocals from music.
- Extract speech from noisy environments.
- Remove unwanted background sounds from videos.
Topics
- SAM Audio
- Audio Separation
- Multimodal Prompting
- Diffusion Transformer
- Perception Encoder Audio Video
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by ai.meta.com via Google News.