Kaggle Solution Walkthroughs: Enefit - Predict Energy Behavior of Prosumers with Team 预测多了一点
Summary
A second-place team in a forecasting competition details their solution, which involved training three distinct models: a neural network (NN), LightGBM, and CatBoost. The team found that transforming the original target variable by calculating ratios with historical targets significantly boosted results by 1-2 points. Key features included ratios between direct and surface solar radiation, and historical electricity values and weather conditions. They employed online training, with total submission training taking approximately six hours on Kaggle kernels. Feature selection involved creating validation datasets to pick the top 50 features, many of which were ratios. The NN model ultimately performed best, leading to a weighted ensemble with 0.5 for NN and 0.25 each for LightGBM and CatBoost.
Key takeaway
For data scientists and ML engineers building forecasting models, consider transforming your target variable into a ratio with historical data, as this approach can yield significant performance gains. Prioritize online training for dynamic datasets to ensure models are always leveraging the latest information. Additionally, focus on creating ratio-based features from historical and environmental data, as these often prove highly impactful for predictive accuracy.
Key insights
Target variable transformation and online training significantly enhance forecasting model performance.
Principles
- Online training outperforms offline training.
- Feature engineering with ratios improves model accuracy.
Method
Train separate models for consumption and production using online training. Transform target variables into ratios with historical data. Employ cross-validation (4-fold for NN, 6-fold for boosting models) and ensemble with weighted averaging based on public leaderboard scores.
In practice
- Calculate target ratios with historical values.
- Use online training for regularly updated datasets.
- Engineer features from weather and historical electricity data.
Topics
- Neural Networks
- Gradient Boosting Models
- Feature Engineering
- Online Machine Learning
- Ensemble Modeling
Best for: Machine Learning Engineer, Data Scientist, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Kaggle.