International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2026

2026-04-30 · Source: Apple Machine Learning Research · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Speech and Natural Language Processing · Depth: Expert, quick

Summary

Apple will present three new research papers at the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2026, held in Barcelona, Spain, from May 4 to 8. The company is also a sponsor of the conference, which convenes scientific and industrial communities focused on signal processing. Apple's participation includes a booth, #P2, at the Centre de Convencions Internacional de Barcelona (CCIB). The accepted papers cover "Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models," presented as a poster on Wednesday, May 6; "StereoFoley: Object-Aware Stereo Audio Generation from Video," a poster on Friday, May 8; and "Principled Coarse-Grained Acceptance for Speculative Decoding in Speech," an oral presentation on Friday, May 8.

Key takeaway

For research scientists developing speech and audio technologies, you should review Apple's ICASSP 2026 presentations to understand emerging techniques in multilingual speech modeling, efficient speech decoding, and video-driven audio generation. Consider how these methods could enhance your current projects, particularly in areas requiring robust multilingual support or advanced audio synthesis from visual cues.

Key insights

Apple is advancing speech and audio processing through self-supervised learning, speculative decoding, and object-aware audio generation.

Principles

Audio-visual data improves multilingual speech models.
Speculative decoding enhances speech system efficiency.
Video context enables object-aware stereo audio generation.

Method

The research includes methods for leveraging audio-visual data in self-supervised speech models, applying principled coarse-grained acceptance for speculative decoding in speech, and generating object-aware stereo audio from video input.

In practice

Explore audio-visual data for multilingual speech tasks.
Investigate speculative decoding for efficient speech systems.
Apply video analysis for context-aware audio synthesis.

Topics

ICASSP 2026
Apple Machine Learning Research
Audio-Visual Speech Processing
Self-Supervised Speech Models
Stereo Audio Generation

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Apple Machine Learning Research.