The journey of Modernizing TorchVision – Memoirs of a TorchVision developer – 3
Summary
TorchVision has undergone significant modernization efforts, detailed in a developer's memoir covering releases v0.12 and v0.13, and plans for 2022H2. Version 0.12 focused on updating deprecation and model contribution policies to foster community engagement, alongside integrating new model architectures like FCOS, RAFT, Vision Transformer (ViT), and ConvNeXt, plus 19 new datasets. The upcoming v0.13 release, expected in early June, continues this modernization by adding data augmentation techniques such as AugMix and Large Scale Jitter, new building blocks like DropBlock and cIoU/dIoU loss, and architectures including Swin Transformer and EfficientNetV2. This release also introduces a new Multi-weight Support API and revamped model documentation. Future plans for 2022H2 include integrating MViTv2, improving the Datasets API (v2) with TorchData, extending the Transforms API (v2) for bounding boxes and segmentation masks, and adding architectures like DeTR.
Key takeaway
For AI Engineers and Computer Vision researchers aiming to achieve state-of-the-art results, you should explore TorchVision v0.13's new data augmentation techniques like AugMix and Large Scale Jitter, and leverage the improved pre-trained weights for classification, detection, and segmentation models. The new Multi-weight Support API simplifies model instantiation and metadata access, streamlining your workflow for integrating advanced computer vision capabilities.
Key insights
TorchVision is actively modernizing its codebase, policies, and model offerings to enhance community contributions and SOTA performance.
Principles
- Clear policies drive community contributions.
- Continuous modernization closes SOTA gaps.
- API design impacts documentation and usability.
Method
TorchVision's modernization involves updating contribution/deprecation policies, integrating new SOTA models and data augmentations, and improving training recipes to boost model accuracy across tasks.
In practice
- Utilize TorchVision's new Multi-weight Support API.
- Explore updated training recipes for SOTA models.
- Contribute to TorchVision via "good first issues".
Topics
- TorchVision Modernization
- Computer Vision Models
- Data Augmentation
- Pre-trained Weights
- API Development
Code references
Best for: Computer Vision Engineer, AI Engineer, AI Scientist, Machine Learning Engineer, Deep Learning Engineer, AI Researcher
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Datumbox.