EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

2026-05-15 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

EndPrompt is a novel method for efficiently extending the context window of large language models (LLMs) from 8K to 64K tokens without requiring expensive full-length sequence training. Developed by researchers from Nankai University, Baidu Inc., and Shanghai Jiao Tong University, EndPrompt leverages a two-segment input structure: it preserves the original short context as an intact first segment and appends a brief terminal prompt as a second segment. This terminal prompt is assigned positional indices near the target context length, introducing both local and long-range relative distances within a physically short sequence. This approach maintains semantic continuity, unlike chunk-based methods. EndPrompt achieves an average RULER score of 76.03 and the highest average on LongBench, outperforming baselines like LCEG (72.24), LongLoRA (72.95), and full-length fine-tuning (69.23) while significantly reducing computational costs and memory footprint. The method's effectiveness is theoretically grounded in Rotary Position Embedding (RoPE) and Position Interpolation (PI), demonstrating that sparse positional supervision can induce robust long-context generalization.

Key takeaway

For NLP engineers and research scientists seeking to extend LLM context windows efficiently, EndPrompt offers a compelling alternative to costly full-length fine-tuning. By using short training sequences with a strategically placed terminal prompt, you can achieve superior long-context generalization and maintain short-text capabilities, significantly reducing computational resources and training time. Consider integrating this method, especially for LLaMA-family models, to scale context without compromising performance or incurring prohibitive expenses.

Key insights

Sparse positional supervision with a terminal anchor can efficiently extend LLM context windows without full-length training.

Principles

Semantic continuity is crucial for effective context extension.
Position interpolation induces smoothness over unobserved distances.
Shared Transformer parameters unify multi-scale positional constraints.

Method

Retain original short context, append a brief terminal prompt, and assign the prompt positional indices near the target context length to create sparse long-range supervision.

In practice

Extend LLaMA-family models from 8K to 64K context.
Achieve superior performance on LongBench and RULER benchmarks.
Reduce memory by 52% and accelerate training by 1.41x vs. full fine-tuning.

Topics

EndPrompt
Context Window Extension
Rotary Position Embedding
Position Interpolation
Sparse Positional Supervision

Code references

clx1415926/EndPrompt

Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.