EASE-TTT: Evidence-Aligned Selective Test-Time Training for Long-Context Question Answering
Summary
EASE-TTT (Evidence-Aligned Selective Test-Time Training) is a novel framework designed to improve long-context question answering (QA) performance in smaller language models by addressing their difficulty in reliably accessing answer-bearing evidence. Unlike traditional within-context retrieval methods that modify input or query-only test-time training (qTTT) with generic objectives, EASE-TTT selects question-relevant evidence chunks and converts them into a soft attention supervision target. This target guides query-side LoRA adapter updates during test-time adaptation, enabling the model to generate answers from the original full context. Experiments on six LongBench QA tasks, including MuSiQue and HotpotQA, with Qwen3-0.6B, Qwen3-1.7B, and Llama-3.2-1B models, demonstrate EASE-TTT's superior macro-average performance. For Qwen3-1.7B, it achieved a 30.6 average score, surpassing full-context inference by 5.6 points and qTTT by 1.9 points, with a moderate increase in per-example runtime from 6.7s to 9.1s.
Key takeaway
For machine learning engineers deploying smaller language models for long-context question answering, if your models struggle with evidence utilization despite sufficient context, consider implementing EASE-TTT. This approach improves answer quality by guiding query-side attention with evidence-aligned supervision, outperforming generic test-time training and retrieval-only methods. While it introduces a moderate latency increase (e.g., 2.4s per example), the significant gains in accuracy on tasks like 2WikiMultihopQA and QASPER justify the overhead for critical applications.
Key insights
Retrieved evidence can directly supervise query-side attention adaptation for improved long-context QA in smaller LLMs.
Principles
- Small LLMs struggle with evidence access in long, distractor-heavy contexts.
- Evidence-aligned attention supervision improves test-time adaptation.
- Soft attention targets offer more stable guidance than hard masking.
Method
EASE-TTT segments context, ranks chunks by question-conditioned utility, selects top-K, then constructs a soft attention target to update query-side LoRA adapters via KL divergence.
In practice
- Prioritize utility-based evidence selection over lexical methods.
- Employ soft attention targets for robust adaptation.
- Utilize LoRA for efficient query-side parameter updates.
Topics
- EASE-TTT
- Long-Context QA
- Test-Time Training
- Small Language Models
- Attention Mechanisms
- Within-Context Retrieval
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.