Synthetic Benchmarks Overstate Forward-Forward Scaling: Real-Data Limits of Layer-Local Training
Summary
DTG-FF, a novel Forward-Forward (FF) learning instrument, establishes a new performance benchmark for FF-family models across nine real-data benchmarks, achieving 91.8% on CIFAR-10 and the first FF baseline at ImageNet-100 224x224. Despite these advancements, a rigorous audit using DTG-FF reveals significant scaling limitations for layer-local training on real-world data. An architecture-matched BP-DeepSup baseline surpasses DTG-FF by 2.40/5.93 percentage points on CIFAR-10/CIFAR-100, with the gap widening with class count. At 224x224 resolution, DTG-FF reaches only 49.4%, substantially below typical BP performance exceeding 75%, exposing a real-data ceiling. The study also identifies a "K-conflict," where synthetic benchmarks overstate FF's transferability. A systems audit on 8 GB hardware shows DTG-FF consumes 7.90 GB and processes 138 images/second, while BP+gradient-accumulation uses 4.18 GB and 157 images/second, challenging FF's memory efficiency claims at this scale.
Key takeaway
For Machine Learning Engineers evaluating layer-local training methods like Forward-Forward for large-scale computer vision tasks, you should recognize their current limitations. This research indicates that FF models, despite recent improvements, do not scale effectively on real-world data beyond 32x32 resolutions and offer no memory advantage over standard backpropagation with gradient accumulation on commodity 8 GB hardware. Prioritize backpropagation for robust performance and efficient resource utilization in production systems.
Key insights
Forward-Forward learning's real-data scaling and memory efficiency are significantly overstated by synthetic benchmarks, revealing limitations compared to backpropagation.
Principles
- Synthetic benchmarks can overstate model transferability.
- Real-data scaling limits are invisible at small resolutions.
- Memory efficiency claims require fair, real-world baselines.
Method
DTG-FF integrates dynamic temperature goodness, decoupled normalization, and multi-layer fusion to enhance Forward-Forward learning, enabling rigorous auditing of its real-data scaling limits.
In practice
- Use 224x224 ImageNet-100 for FF scaling audits.
- Avoid synthetic K-sweeps for real-world transferability.
- Benchmark FF memory against BP+gradient-accumulation.
Topics
- Forward-Forward Learning
- Layer-Local Training
- Real-Data Benchmarks
- Backpropagation
- Model Scaling
- Deep Learning Systems
Best for: Research Scientist, Computer Vision Engineer, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.