Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

The CADE (Contrastive Alignment with Direct Embedding) framework addresses the tokenization bottleneck in Time-Series Question Answering (TSQA) for Large Language Models (LLMs). Traditional Byte Pair Encoding (BPE) fragments continuous numerical values, losing critical magnitude and trend information, while prior patch-based encoders fix temporal granularity. CADE introduces a point-wise linear encoder and MLP projector to map each timestep directly into the LLM embedding space, preserving exact index-level access and handling variable series lengths without patching. Additionally, a one-directional supervised contrastive loss aligns time-series embeddings with frozen class-name text anchors, enhancing semantic correspondence. Experiments on the Time-MQA benchmark demonstrate CADE's consistent performance improvements across six TSQA tasks, including raising forecasting FCR from 0.46 to 0.596 and reducing imputation MSE from 2,399,043 to 34,532. This 0.6B model outperforms both open-source and proprietary LLM baselines, including DeepSeek-V3.2 on numeric accuracy and discriminative understanding.

Key takeaway

For Machine Learning Engineers integrating time series with LLMs, you should prioritize direct numerical embedding over standard text tokenization. Your models will achieve significantly better accuracy on tasks like forecasting and imputation by preserving the metric structure of time series data. Consider implementing a lightweight linear encoder and an MLP projector for continuous input, as this approach demonstrably outperforms BPE serialization and even larger, general-purpose LLMs on numeric tasks.

Key insights

Direct timestep embedding and contrastive alignment overcome LLM tokenization limits for time-series data.

Principles

Method

CADE uses a linear encoder and MLP projector for direct timestep embedding, then a one-directional supervised contrastive loss aligns these with frozen class-text anchors.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.