Context Compression Is Not One Thing: Readable Symbolic Re-expression vs. Coherent Summary at Matched Budget
Summary
A new context compression technique, Telegraph English, has been introduced for multi-hop question answering with small language models. This readable symbolic format rewrites retrieved passages into structured entity-relation statements, aiming to preserve reasoning evidence at a lower token cost. In controlled experiments conducted on MuSiQue, TwoWiki, and HotpotQA datasets, Telegraph English consistently outperformed three matched-budget compression baselines—character-level deletion, truncation, and random sub-sampling—achieving gains of 13 to 20 F1 percentage points across all datasets. It also demonstrated superior performance against a coherent prose summary generated by the same encoder on the most challenging dataset. A pre-registered hypothesis regarding advantage growth with reasoning depth was found to be null. These results suggest that readable symbolic re-expression more densely preserves entity content compared to natural language or coherent summarization within the same token budget.
Key takeaway
For Machine Learning Engineers optimizing context windows in small language models for multi-hop question answering, you should investigate symbolic re-expression techniques. Implementing a structured format like Telegraph English, which converts passages into entity-relation statements, can yield substantial performance improvements, potentially boosting F1 scores by 13-20 percentage points. This approach offers a more token-efficient way to preserve critical reasoning evidence than traditional summarization or truncation methods.
Key insights
Telegraph English's symbolic re-expression significantly improves context compression for multi-hop QA over natural language at matched token budgets.
Principles
- Symbolic re-expression enhances information density.
- Structured entity-relation formats improve context preservation.
- Token budget matching is vital for compression evaluation.
Method
Telegraph English rewrites retrieved passages into structured entity-relation statements, creating a readable symbolic format that preserves reasoning evidence at lower token cost for multi-hop QA.
In practice
- Implement symbolic re-expression for context compression.
- Employ entity-relation statements to reduce token costs.
- Benchmark compression against matched-budget baselines.
Topics
- Context Compression
- Symbolic Re-expression
- Multi-hop QA
- Small Language Models
- Entity-Relation Extraction
- F1 Score
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.