LUMINA-26: Low-Light Understanding for Modeling and Interpreting Night-time Actions

2026-06-22 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, medium

Summary

Researchers introduce LUMINA-26, a new dataset designed to address challenges in low-light human action recognition, which suffers from poor illumination, noise, and motion ambiguity. This dataset comprises 6,784 video clips across 26 distinct action classes, featuring recordings from 22 subjects in 20 diverse indoor and outdoor environments under natural low-light conditions. Alongside LUMINA-26, they propose Illumi-Net, an Illumination-Adaptive Mixture-of-Experts Network. Illumi-Net utilizes video-level illumination cues to guide adaptive enhancement and transformer-based spatio-temporal feature extraction, integrating expert-conditioned decision fusion. The proposed method achieves superior performance, surpassing prior benchmark results on ELLAR with a Top-1 accuracy of 55.13% and Top-5 accuracy of 78.87%. Furthermore, Illumi-Net establishes a strong baseline on the new LUMINA-26 dataset, achieving a Top-1 accuracy of 75.95% and Top-5 accuracy of 93.58%, providing a practical benchmark for future research.

Key takeaway

For computer vision engineers developing robust action recognition systems, you should consider LUMINA-26 as a new, diverse benchmark for low-light scenarios. Implementing illumination-adaptive techniques, like those in Illumi-Net, can significantly improve model performance in challenging conditions. This approach helps overcome issues like noise and motion ambiguity, ensuring your models generalize better to real-world night-time applications.

Key insights

A new dataset and adaptive network significantly advance low-light human action recognition by addressing data and model limitations.

Principles

Low-light action recognition requires diverse, realistic data.
Illumination cues can guide adaptive model enhancement.
Mixture-of-Experts improves decision fusion.

Method

Illumi-Net uses video-level illumination cues for adaptive enhancement, followed by transformer-based spatio-temporal feature extraction. Expert-conditioned decision fusion then combines outputs for robust low-light human action recognition.

In practice

Utilize LUMINA-26 as a benchmark dataset.
Integrate illumination-adaptive enhancement.

Topics

Low-Light Vision
Human Action Recognition
Video Datasets
Illumi-Net
Mixture-of-Experts
Transformer Networks
Spatio-Temporal Features

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.