Can LLM Embeddings Improve Time Series Forecasting? A Practical Feature Engineering Approach

2026-02-27 · Source: MachineLearningMastery.com - Machinelearningmastery.com · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, FinTech & Digital Financial Services · Depth: Intermediate, medium

Summary

This article explores whether integrating large language model (LLM) embeddings as engineered features can enhance time series forecasting performance. It details a practical example using daily Dow Jones Industrial Average (DJIA) closing prices and corresponding financial news headlines from 2008 to 2016. The process involves building a baseline model with traditional time series features (lagged returns, rolling statistics) and a full model that additionally incorporates 20-dimensional PCA-reduced embeddings generated from news headlines using a SentenceTransformer model ("all-MiniLM-L6-v2"). Both models, trained with `LGBMClassifier`, predict the direction of the next day's DJIA return. The baseline model achieved an accuracy of 0.5, while the full model with embeddings achieved 0.50476, indicating only a marginal, practically insignificant improvement.

Key takeaway

For Data Scientists evaluating new feature engineering techniques in financial time series forecasting, you should establish a strong baseline with traditional features before integrating LLM embeddings. The observed marginal gains (0.5 to 0.50476 accuracy) suggest that LLM embeddings are not a universal solution and require rigorous, context-specific validation across multiple experimental settings and time splits to confirm consistent and statistically meaningful improvements.

Key insights

LLM embeddings for time series forecasting offer marginal improvements, requiring careful validation.

Principles

Traditional time series features establish a robust baseline.
Dimensionality reduction (PCA) is crucial for embeddings.
Performance gains from LLM embeddings are context-dependent.

Method

Generate LLM embeddings from related text data, reduce dimensionality via PCA, then merge with traditional time series features to train and compare forecasting models against a baseline.

In practice

Use `yfinance` for stock data retrieval.
Apply `SentenceTransformer` for text embeddings.
Employ `LGBMClassifier` for forecasting models.

Topics

LLM Embeddings
Time Series Forecasting
Feature Engineering
Financial Time Series
Sentence Transformers

Best for: AI Engineer, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by MachineLearningMastery.com - Machinelearningmastery.com.