Kaggle Winners Walkthroughs: Jane Street Real-Time Market Data Forecasting with Team Patrick Yam
Summary
A Kaggle Grandmaster details a winning solution for a financial crypto forecasting competition, employing a modified Axial Transformer model. The architecture integrates a Gated Recurrent Unit (GRU) layer for time series assets, replacing one self-attention layer, and processes an entire day's data as a single 2D input to capture both time series and cross-sectional information. Feature engineering is minimal, adding only a Gaussian-ranked "time of day" feature. The training regimen uses seed ensembling with multiple random seeds instead of traditional cross-validation, combined with multitask learning and a custom R-squared loss function. Crucially, the solution incorporates daily online learning via an Adam optimizer and features a fast inference module that parallelizes predictions across multiple models, achieving 2.5 hours for 17 models within a 9-hour limit.
Key takeaway
For AI Data Scientists developing financial forecasting models, consider integrating 2D transformer architectures that simultaneously process time series and cross-sectional data. Your models will benefit significantly from daily online learning to adapt to market shifts, and employing seed ensembling can enhance robustness. Prioritize fast inference techniques, such as replacing linear modules with stacked weights, to meet strict deployment time limits.
Key insights
A modified Axial Transformer with GRU and online learning excels in financial time series forecasting.
Principles
- Combine time series and cross-sectional data.
- Online learning is critical for dynamic financial data.
- Seed ensembling improves model robustness.
Method
The method involves a modified Axial Transformer with a GRU layer, minimal feature engineering, multitask learning with weighted R-squared loss, seed ensembling, daily online learning, and a parallelized inference module for speed.
In practice
- Replace self-attention with GRU for time series.
- Use Optuna for optimizer hyperparameter tuning.
- Implement parallel inference for ensemble models.
Topics
- Axial Transformer
- Time Series Forecasting
- Online Learning
- Multitask Learning
- Inference Optimization
Best for: AI Data Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Kaggle.