Kaggle Solution Walkthroughs: LEAP - Atmospheric Physics using AI (ClimSim) with TeamZ Lab 数据实验室
Summary
A data science competition team from China, with multiple Kaggle medals, details their winning solution for a multi-label regression task. Their approach involved extensive data processing, utilizing 75 million samples from low-resolution datasets (years 1-8) for training and 0.8 million samples for validation. Key to their success was a training method that initially trained on all 386 labels, then fine-tuned the model seven times, once for each of seven distinct label groups, achieving a score improvement of at least 0.001. They employed a cosine scheduler with restart and early stopping (9 epochs, two periods of 3 and 6 epochs) and optimized with Smooth L1 loss and an auxiliary difference loss. Their models primarily leveraged LSTM architectures, outperforming CNNs and Transformers, and incorporated residual connections for faster convergence and better performance. Ensemble methods, including hill climbing with negative weights, combined diverse LSTM-based models, some with convolutional encoders or MemNN blocks, to achieve top scores.
Key takeaway
For data scientists and ML engineers tackling multi-label regression problems, consider adopting a staged training approach. First, train a comprehensive model, then fine-tune it specifically for distinct groups of related labels. This strategy, combined with robust LSTM architectures and a cosine learning rate scheduler with restarts, can significantly boost model performance and help escape local minima, as demonstrated by a 0.001 score improvement in a competitive setting.
Key insights
Group-wise fine-tuning and LSTM-based architectures significantly enhance multi-label regression performance.
Principles
- Fine-tuning by label groups improves multi-label regression.
- LSTMs can outperform CNNs/Transformers in specific tasks.
- Residual connections accelerate model convergence.
Method
Train a multi-label model on all targets, then fine-tune it iteratively for specific label groups, adjusting the loss to focus on one group at a time.
In practice
- Use Smooth L1 loss for stable multi-label regression.
- Implement an auxiliary difference loss for related labels.
- Employ cosine scheduler with restarts for training.
Topics
- Group Fine-tuning
- LSTM Architectures
- Ensemble Learning
- Loss Functions
- Kaggle Competition
Best for: Machine Learning Engineer, Data Scientist, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Kaggle.