Need reliable source for 30+ years of S&P 500 historical data for LSTM/Transformer research [P]

· Source: Machine Learning · Field: Finance & Economics — Capital Markets & Investment Management, FinTech & Digital Financial Services, Artificial Intelligence & Machine Learning · Depth: Advanced, quick

Summary

A researcher is seeking reliable, long-term historical S&P 500 data, specifically daily OHLCV (Open, High, Low, Close, Volume) for approximately 30 years, to support a Master's level project on financial time-series forecasting using LSTM and Transformer models. The researcher has encountered difficulties with Yahoo Finance's inconsistent downloads and the limited data history (5-10 years) in most Kaggle datasets. They are also inquiring about data sources commonly used in academic financial forecasting, such as Alpha Vantage, WRDS/CRSP, Polygon, and Tiingo, and whether S&P 500 index data alone is sufficient or if additional data like technical indicators, macroeconomic factors, sentiment, or constituent stock data should be included. Polygon.io is suggested for extensive data, while CRSP is noted as a standard for serious academic research, often accessible through university libraries.

Key takeaway

For research scientists developing financial time-series forecasting models, you should prioritize securing high-quality, long-term data from established academic sources like CRSP, which is often available through university affiliations. While daily OHLCV data can identify trends, consider augmenting S&P 500 index data with macroeconomic indicators or constituent stock data to enhance model robustness and explore more impactful applications like smart beta portfolio optimization.

Key insights

Reliable, long-term financial data is crucial for robust time-series forecasting research.

Principles

Method

For academic financial forecasting, CRSP is a standard data source, often available via university libraries, offering comprehensive historical data.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.