Adopting a human developmental visual diet yields robust and shape-based AI vision

· Source: Nature Machine Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A new study introduces a "developmental visual diet" (DVD) for training AI vision systems, drawing inspiration from human visual maturation from birth to 25 years. This approach, which gradually introduces visual acuity, contrast sensitivity, and chromatic sensitivity, aims to address the misalignment between artificial and human vision, where AI often relies on texture rather than shape. Experiments with various deep neural networks (DNNs), including ResNet-50, trained on datasets like mini-ecoset, ecoset, and ImageNet-1K, demonstrated that DVD-trained models achieved significantly higher shape bias (up to 0.94, comparable to human levels of 0.90-0.97) compared to baseline models (0.2-0.4). These models also showed enhanced recognition of abstract shapes embedded in complex backgrounds, outperforming large AI foundation models, and exhibited greater robustness to image degradations (e.g., blur, noise, weather effects) and adversarial attacks. The research highlights that guiding *how* a model learns, rather than just *how much*, offers a resource-efficient path to more human-like and robust AI vision.

Key takeaway

AI Engineers and Research Scientists developing computer vision systems should integrate the Developmental Visual Diet (DVD) preprocessing pipeline into their training regimes. This method, which simulates human visual maturation, significantly enhances shape bias, abstract shape recognition, and robustness against image degradations and adversarial attacks, even outperforming larger foundation models. Adopting DVD can lead to more human-aligned and reliable AI vision systems without requiring massive increases in data or model parameters, offering a computationally efficient path to improved performance.

Key insights

Mimicking human visual development in AI training fosters shape-based perception and robustness.

Principles

Method

The DVD pipeline applies age-dependent Gaussian blur for visual acuity, frequency-domain thresholding for contrast sensitivity, and linear interpolation for chromatic sensitivity to training images, simulating human visual maturation.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Nature Machine Intelligence.