EASE-TTT: Evidence-Aligned Selective Test-Time Training for Long-Context Question Answering

2026-06-08 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Natural Language Processing · Depth: Expert, extended

Summary

EASE-TTT (Evidence-Aligned Selective Test-Time Training) is a novel framework designed to improve long-context question answering (QA) performance in smaller language models by addressing their difficulty in reliably accessing answer-bearing evidence. Unlike traditional within-context retrieval methods that modify input or query-only test-time training (qTTT) with generic objectives, EASE-TTT selects question-relevant evidence chunks and converts them into a soft attention supervision target. This target guides query-side LoRA adapter updates during test-time adaptation, enabling the model to generate answers from the original full context. Experiments on six LongBench QA tasks, including MuSiQue and HotpotQA, with Qwen3-0.6B, Qwen3-1.7B, and Llama-3.2-1B models, demonstrate EASE-TTT's superior macro-average performance. For Qwen3-1.7B, it achieved a 30.6 average score, surpassing full-context inference by 5.6 points and qTTT by 1.9 points, with a moderate increase in per-example runtime from 6.7s to 9.1s.

Key takeaway

For machine learning engineers deploying smaller language models for long-context question answering, if your models struggle with evidence utilization despite sufficient context, consider implementing EASE-TTT. This approach improves answer quality by guiding query-side attention with evidence-aligned supervision, outperforming generic test-time training and retrieval-only methods. While it introduces a moderate latency increase (e.g., 2.4s per example), the significant gains in accuracy on tasks like 2WikiMultihopQA and QASPER justify the overhead for critical applications.

Key insights

Retrieved evidence can directly supervise query-side attention adaptation for improved long-context QA in smaller LLMs.

Principles

Small LLMs struggle with evidence access in long, distractor-heavy contexts.
Evidence-aligned attention supervision improves test-time adaptation.
Soft attention targets offer more stable guidance than hard masking.

Method

EASE-TTT segments context, ranks chunks by question-conditioned utility, selects top-K, then constructs a soft attention target to update query-side LoRA adapters via KL divergence.

In practice

Prioritize utility-based evidence selection over lexical methods.
Employ soft attention targets for robust adaptation.
Utilize LoRA for efficient query-side parameter updates.

Topics

EASE-TTT
Long-Context QA
Test-Time Training
Small Language Models
Attention Mechanisms
Within-Context Retrieval

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.