I trained a model. What is next?
Summary
Vladimir Iglovikov, a Kaggle Grandmaster, outlines a structured approach for machine learning practitioners to maximize the value and impact of their trained models beyond initial competition or research. The process, detailed in a September 2020 post, involves eight key steps, each with an estimated time commitment. These steps include releasing code to a public GitHub repository, improving code readability with tools like `black`, `flake8`, and `mypy`, creating comprehensive README files, and making models easy to use via `torch.utils.model_zoo.load_url` and GitHub releases for weight hosting. Further steps involve transforming the repository into a pip-installable Python library, developing Google Colab notebooks for interactive demonstrations, building simple web applications using Streamlit and Heroku, and finally, writing blog posts and academic papers to share insights and solutions.
Key takeaway
For Machine Learning Engineers and Data Scientists looking to enhance their project's visibility and career opportunities, systematically applying post-training steps is crucial. You should prioritize making your code public, readable, and easily consumable as a library, complemented by interactive demos and clear documentation. This approach not only boosts your technical knowledge and personal brand but also significantly improves the discoverability and impact of your work.
Key insights
Maximize ML project impact by systematically sharing code, models, and knowledge through public platforms and documentation.
Principles
- Public code doesn't need to be perfect.
- Automate code quality checks early.
- Prioritize user-friendliness for adoption.
Method
After training, release code publicly, enhance readability with formatters/checkers, create a detailed README, enable easy model loading, package as a library, build Colab demos and web apps, then document via blog posts and papers.
In practice
- Use `black`, `flake8`, `mypy` for code hygiene.
- Host model weights on GitHub releases.
- Deploy simple web apps with Streamlit/Heroku.
Topics
- Machine Learning Workflow
- Code Quality
- Model Deployment
- Python Packaging
- Technical Communication
Code references
Best for: Machine Learning Engineer, Data Scientist, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Kaggle Blog - Medium.