Need reliable source for 30+ years of S&P 500 historical data for LSTM/Transformer research [P]
Summary
A researcher is seeking reliable, long-term historical S&P 500 data, specifically daily OHLCV (Open, High, Low, Close, Volume) for approximately 30 years, to support a Master's level project on financial time-series forecasting using LSTM and Transformer models. The researcher has encountered difficulties with Yahoo Finance's inconsistent downloads and the limited data history (5-10 years) in most Kaggle datasets. They are also inquiring about data sources commonly used in academic financial forecasting, such as Alpha Vantage, WRDS/CRSP, Polygon, and Tiingo, and whether S&P 500 index data alone is sufficient or if additional data like technical indicators, macroeconomic factors, sentiment, or constituent stock data should be included. Polygon.io is suggested for extensive data, while CRSP is noted as a standard for serious academic research, often accessible through university libraries.
Key takeaway
For research scientists developing financial time-series forecasting models, you should prioritize securing high-quality, long-term data from established academic sources like CRSP, which is often available through university affiliations. While daily OHLCV data can identify trends, consider augmenting S&P 500 index data with macroeconomic indicators or constituent stock data to enhance model robustness and explore more impactful applications like smart beta portfolio optimization.
Key insights
Reliable, long-term financial data is crucial for robust time-series forecasting research.
Principles
- Historical trends do not guarantee future outcomes.
- Stock market data reflects divergences from market participant models.
Method
For academic financial forecasting, CRSP is a standard data source, often available via university libraries, offering comprehensive historical data.
In practice
- Consider Polygon.io for extensive historical data.
- Explore Tiingo's free plan for financial data.
- Inquire about CRSP access through university libraries.
Topics
- S&P 500 Data
- Financial Time-Series Forecasting
- LSTM Models
- Transformer Models
- OHLCV Data
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.