EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

2026-05-14 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

EndPrompt is a novel method for extending the context window of large language models (LLMs) from 8K to 64K tokens using only short training sequences, thereby avoiding the quadratic memory and computational costs associated with traditional long-context adaptation. The technique works by preserving the original short context as a first segment and appending a brief terminal prompt as a second segment, assigning it positional indices near the target context length. This two-segment construction introduces both local and long-range relative distances within a short physical sequence, maintaining semantic continuity. Applied to LLaMA-family models, EndPrompt achieved an average RULER score of 76.03 and the highest average on LongBench, outperforming LCEG (72.24), LongLoRA (72.95), and even full-length fine-tuning (69.23) with significantly less computation.

Key takeaway

For AI Engineers and Research Scientists aiming to extend LLM context windows without incurring prohibitive computational costs, EndPrompt offers a highly efficient alternative. You should consider implementing this method, which leverages sparse positional supervision, to achieve significant context generalization with substantially less training data and compute than traditional full-length fine-tuning approaches.

Key insights

EndPrompt extends LLM context windows efficiently using short training sequences and sparse positional supervision.

Principles

Long-range relative positional distances don't require full-length inputs.
Position interpolation induces smoothness over the attention function.

Method

Preserve a short context as a first segment, append a brief terminal prompt as a second segment, and assign it positional indices near the target context length to introduce long-range relative distances.

In practice

Extend LLaMA-family models from 8K to 64K context.
Achieve high RULER and LongBench scores with reduced compute.

Topics

Long-Context Extension
Large Language Models
Rotary Position Embedding
LLaMA-family Models
RULER Score

Code references

clx1415926/EndPrompt

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.