Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering
Summary
The CADE (Contrastive Alignment with Direct Embedding) framework addresses the tokenization bottleneck in Time-Series Question Answering (TSQA) for Large Language Models (LLMs). Traditional Byte Pair Encoding (BPE) fragments continuous numerical values, losing critical magnitude and trend information, while prior patch-based encoders fix temporal granularity. CADE introduces a point-wise linear encoder and MLP projector to map each timestep directly into the LLM embedding space, preserving exact index-level access and handling variable series lengths without patching. Additionally, a one-directional supervised contrastive loss aligns time-series embeddings with frozen class-name text anchors, enhancing semantic correspondence. Experiments on the Time-MQA benchmark demonstrate CADE's consistent performance improvements across six TSQA tasks, including raising forecasting FCR from 0.46 to 0.596 and reducing imputation MSE from 2,399,043 to 34,532. This 0.6B model outperforms both open-source and proprietary LLM baselines, including DeepSeek-V3.2 on numeric accuracy and discriminative understanding.
Key takeaway
For Machine Learning Engineers integrating time series with LLMs, you should prioritize direct numerical embedding over standard text tokenization. Your models will achieve significantly better accuracy on tasks like forecasting and imputation by preserving the metric structure of time series data. Consider implementing a lightweight linear encoder and an MLP projector for continuous input, as this approach demonstrably outperforms BPE serialization and even larger, general-purpose LLMs on numeric tasks.
Key insights
Direct timestep embedding and contrastive alignment overcome LLM tokenization limits for time-series data.
Principles
- BPE tokenization destroys metric structure of numerical time series.
- Continuous token interfaces are superior to BPE for time series.
- Cross-modal alignment improves shared representations across tasks.
Method
CADE uses a linear encoder and MLP projector for direct timestep embedding, then a one-directional supervised contrastive loss aligns these with frozen class-text anchors.
In practice
- Implement direct linear projection for time-series input to LLMs.
- Use one-directional contrastive loss for semantic alignment.
- Prioritize continuous token interfaces over BPE for numerical data.
Topics
- Time-Series Question Answering
- Large Language Models
- Direct Timestep Embedding
- Contrastive Learning
- Tokenization Bottleneck
- Time-MQA Benchmark
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.