EndPrompt: Efficient Long-Context Extension via Terminal Anchoring
Summary
EndPrompt is a novel method for efficiently extending the context window of large language models (LLMs) from 8K to 64K tokens without requiring expensive full-length sequence training. Developed by researchers from Nankai University, Baidu Inc., and Shanghai Jiao Tong University, EndPrompt leverages a two-segment input structure: it preserves the original short context as an intact first segment and appends a brief terminal prompt as a second segment. This terminal prompt is assigned positional indices near the target context length, introducing both local and long-range relative distances within a physically short sequence. This approach maintains semantic continuity, unlike chunk-based methods. EndPrompt achieves an average RULER score of 76.03 and the highest average on LongBench, outperforming baselines like LCEG (72.24), LongLoRA (72.95), and full-length fine-tuning (69.23) while significantly reducing computational costs and memory footprint. The method's effectiveness is theoretically grounded in Rotary Position Embedding (RoPE) and Position Interpolation (PI), demonstrating that sparse positional supervision can induce robust long-context generalization.
Key takeaway
For NLP engineers and research scientists seeking to extend LLM context windows efficiently, EndPrompt offers a compelling alternative to costly full-length fine-tuning. By using short training sequences with a strategically placed terminal prompt, you can achieve superior long-context generalization and maintain short-text capabilities, significantly reducing computational resources and training time. Consider integrating this method, especially for LLaMA-family models, to scale context without compromising performance or incurring prohibitive expenses.
Key insights
Sparse positional supervision with a terminal anchor can efficiently extend LLM context windows without full-length training.
Principles
- Semantic continuity is crucial for effective context extension.
- Position interpolation induces smoothness over unobserved distances.
- Shared Transformer parameters unify multi-scale positional constraints.
Method
Retain original short context, append a brief terminal prompt, and assign the prompt positional indices near the target context length to create sparse long-range supervision.
In practice
- Extend LLaMA-family models from 8K to 64K context.
- Achieve superior performance on LongBench and RULER benchmarks.
- Reduce memory by 52% and accelerate training by 1.41x vs. full fine-tuning.
Topics
- EndPrompt
- Context Window Extension
- Rotary Position Embedding
- Position Interpolation
- Sparse Positional Supervision
Code references
Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.