The journey of Modernizing TorchVision – Memoirs of a TorchVision developer – 3

2022-05-21 · Source: Datumbox · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Advanced, medium

Summary

TorchVision has undergone significant modernization efforts, detailed in a developer's memoir covering releases v0.12 and v0.13, and plans for 2022H2. Version 0.12 focused on updating deprecation and model contribution policies to foster community engagement, alongside integrating new model architectures like FCOS, RAFT, Vision Transformer (ViT), and ConvNeXt, plus 19 new datasets. The upcoming v0.13 release, expected in early June, continues this modernization by adding data augmentation techniques such as AugMix and Large Scale Jitter, new building blocks like DropBlock and cIoU/dIoU loss, and architectures including Swin Transformer and EfficientNetV2. This release also introduces a new Multi-weight Support API and revamped model documentation. Future plans for 2022H2 include integrating MViTv2, improving the Datasets API (v2) with TorchData, extending the Transforms API (v2) for bounding boxes and segmentation masks, and adding architectures like DeTR.

Key takeaway

For AI Engineers and Computer Vision researchers aiming to achieve state-of-the-art results, you should explore TorchVision v0.13's new data augmentation techniques like AugMix and Large Scale Jitter, and leverage the improved pre-trained weights for classification, detection, and segmentation models. The new Multi-weight Support API simplifies model instantiation and metadata access, streamlining your workflow for integrating advanced computer vision capabilities.

Key insights

TorchVision is actively modernizing its codebase, policies, and model offerings to enhance community contributions and SOTA performance.

Principles

Clear policies drive community contributions.
Continuous modernization closes SOTA gaps.
API design impacts documentation and usability.

Method

TorchVision's modernization involves updating contribution/deprecation policies, integrating new SOTA models and data augmentations, and improving training recipes to boost model accuracy across tasks.

In practice

Utilize TorchVision's new Multi-weight Support API.
Explore updated training recipes for SOTA models.
Contribute to TorchVision via "good first issues".

Topics

TorchVision Modernization
Computer Vision Models
Data Augmentation
Pre-trained Weights
API Development

Code references

Best for: Computer Vision Engineer, AI Engineer, AI Scientist, Machine Learning Engineer, Deep Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Datumbox.