Texture-Shape Bias Balancing for Robust Synthetic-to-Real Semantic Segmentation in Automotive NIR Imagery

2026-06-13 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A new generative augmentation framework addresses the challenge of limited high-quality annotated real-world Near-Infrared (NIR) data for automotive semantic segmentation. This framework transforms synthetic images into realistic NIR-style variants using a target style adaptation (TSA) method. TSA fine-tunes a latent diffusion model via low-rank adaptation on a small curated set of real NIR images, applying it to synthetic training data with structure-preserving multi-signal conditioning. To further enhance robustness and reduce texture bias, a Voronoi-based style diversification (VSD) strategy modifies original textures while preserving scene geometry. Experiments across multiple model architectures on vehicle interior and street scene NIR data show that this bias balancing significantly improves segmentation robustness, reducing the domain gap by up to 63.6% on exterior and 28.4% on interior data.

Key takeaway

For Machine Learning Engineers developing automotive perception systems with NIR imagery, you should consider generative augmentation frameworks to overcome data scarcity. Implementing target style adaptation (TSA) and Voronoi-based style diversification (VSD) can significantly reduce the synthetic-to-real domain gap, improving semantic segmentation robustness by up to 63.6% for exterior data. This approach allows you to utilize synthetic datasets more effectively.

Key insights

A generative augmentation framework reduces the synthetic-to-real domain gap for automotive NIR semantic segmentation by balancing texture-shape bias.

Principles

Domain adaptation bridges synthetic-to-real gaps.
Balancing inductive bias boosts segmentation robustness.
Generative augmentation transforms synthetic images.

Method

The framework uses target style adaptation (TSA) to fine-tune a latent diffusion model via low-rank adaptation on real NIR images. It then applies structure-preserving multi-signal conditioning and Voronoi-based style diversification (VSD) to synthetic data.

In practice

Apply TSA for synthetic image style transfer.
Use VSD to reduce texture bias.
Adapt models for NIR automotive vision.

Topics

Semantic Segmentation
Near-Infrared Imaging
Domain Adaptation
Generative Models
Latent Diffusion Models
Automotive Perception
Texture-Shape Bias

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.