Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

A new adversarial attack, the Clean-Referenced Feature-Vocoder Attack, targets automatic speech recognition (ASR) systems by shifting the adversarial search space from raw waveforms to self-supervised learning (SSL) representations. This method, designed to overcome limitations of existing attacks, perturbs more generalizable acoustic-phonetic representations and reconstructs them into speech-like adversarial signals using a vocoder. This approach enhances transferability to black-box ASR systems and bypasses waveform-bounded defenses. When optimized solely on raw Whisper-small as a public surrogate model, the attack achieved a +26.6 WER improvement over the state-of-the-art baseline on black-box models. It also demonstrated effectiveness against multiple training defenses, yielding a +36.2 WER improvement, highlighting a significant blind spot in current ASR robustness evaluation.

Key takeaway

For AI Security Engineers evaluating ASR system robustness, recognize that current waveform-bounded defenses are insufficient. This new feature-vocoder attack demonstrates a significant blind spot, achieving +26.6 WER improvement on black-box models. You must shift your defense strategies to address perturbations within self-supervised learning representations, not just raw audio. Prioritize developing defenses that operate in the feature space to mitigate these advanced, transferable adversarial threats.

Key insights

ASR adversarial attacks can achieve superior transferability and defense evasion by perturbing self-supervised learning features instead of raw waveforms.

Principles

Method

The Clean-Referenced Feature-Vocoder Attack perturbs SSL representations, then reconstructs them via a vocoder into speech-like adversarial waveforms, optimized on a surrogate model like Whisper-small.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.