A sneak peek at TorchVision v0.11 – Memoirs of a TorchVision developer – 2

2021-10-10 · Source: Datumbox · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, short

Summary

TorchVision v0.11, released alongside PyTorch v1.10, introduces several significant user-facing features across models, data augmentations, operators, layers, and training recipes. Key additions include implementations of the RegNet architecture with 14 pre-trained variants and EfficientNet B0-B7. New data augmentation techniques like TrivialAugment, RandAugment, Mixup, and CutMix are also integrated. The update brings new operators such as a semantic segmentation mask to bounding box converter, and backwards implementations for bilinear and bicubic interpolation with anti-alias for CPUs and GPUs. Furthermore, common building blocks like Squeeze-Excitation and Conv-Norm-Activation layers have been refactored, and Stochastic Depth layer added. Training recipes now support Exponential Moving Average, Label Smoothing, and Learning Rate Warmup. Other improvements include an FX-based utility for extracting intermediate features, CUDA 11.3 support, and fixes for JPEG library dependency issues.

Key takeaway

For AI Scientists and Computer Vision Engineers developing new models or refining existing ones, TorchVision v0.11 offers critical updates. Your projects can benefit from the inclusion of high-performing architectures like RegNet and EfficientNet, alongside advanced data augmentation methods such as TrivialAugment and RandAugment. Explore the new operators and refactored layers to streamline model development and potentially improve training efficiency and accuracy, especially with enhanced support for CUDA 11.3.

Key insights

TorchVision v0.11 enhances deep learning workflows with new models, augmentations, and foundational improvements.

Principles

Reproduce original paper results closely.
Prioritize efficient model architectures.
Simplify and generalize augmentation strategies.

Method

The release integrates new architectures like RegNet and EfficientNet, adds augmentations such as TrivialAugment and RandAugment, and refactors common layers for improved reusability and performance.

In practice

Utilize RegNet or EfficientNet for new vision tasks.
Apply TrivialAugment for robust data augmentation.
Leverage FX-based utilities for feature extraction.

Topics

TorchVision v0.11
Neural Network Architectures
Data Augmentation
Training Optimization
Computer Vision Models

Code references

Best for: Computer Vision Engineer, AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Datumbox.