I trained a model. What is next?

2020-09-09 · Source: Kaggle Blog - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Intermediate, long

Summary

Vladimir Iglovikov, a Kaggle Grandmaster, outlines a structured approach for machine learning practitioners to maximize the value and impact of their trained models beyond initial competition or research. The process, detailed in a September 2020 post, involves eight key steps, each with an estimated time commitment. These steps include releasing code to a public GitHub repository, improving code readability with tools like `black`, `flake8`, and `mypy`, creating comprehensive README files, and making models easy to use via `torch.utils.model_zoo.load_url` and GitHub releases for weight hosting. Further steps involve transforming the repository into a pip-installable Python library, developing Google Colab notebooks for interactive demonstrations, building simple web applications using Streamlit and Heroku, and finally, writing blog posts and academic papers to share insights and solutions.

Key takeaway

For Machine Learning Engineers and Data Scientists looking to enhance their project's visibility and career opportunities, systematically applying post-training steps is crucial. You should prioritize making your code public, readable, and easily consumable as a library, complemented by interactive demos and clear documentation. This approach not only boosts your technical knowledge and personal brand but also significantly improves the discoverability and impact of your work.

Key insights

Maximize ML project impact by systematically sharing code, models, and knowledge through public platforms and documentation.

Principles

Public code doesn't need to be perfect.
Automate code quality checks early.
Prioritize user-friendliness for adoption.

Method

After training, release code publicly, enhance readability with formatters/checkers, create a detailed README, enable easy model loading, package as a library, build Colab demos and web apps, then document via blog posts and papers.

In practice

Use `black`, `flake8`, `mypy` for code hygiene.
Host model weights on GitHub releases.
Deploy simple web apps with Streamlit/Heroku.

Topics

Machine Learning Workflow
Code Quality
Model Deployment
Python Packaging
Technical Communication

Code references

Best for: Machine Learning Engineer, Data Scientist, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Kaggle Blog - Medium.