Context Compression Is Not One Thing: Readable Symbolic Re-expression vs. Coherent Summary at Matched Budget

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

A new context compression technique, Telegraph English, has been introduced for multi-hop question answering with small language models. This readable symbolic format rewrites retrieved passages into structured entity-relation statements, aiming to preserve reasoning evidence at a lower token cost. In controlled experiments conducted on MuSiQue, TwoWiki, and HotpotQA datasets, Telegraph English consistently outperformed three matched-budget compression baselines—character-level deletion, truncation, and random sub-sampling—achieving gains of 13 to 20 F1 percentage points across all datasets. It also demonstrated superior performance against a coherent prose summary generated by the same encoder on the most challenging dataset. A pre-registered hypothesis regarding advantage growth with reasoning depth was found to be null. These results suggest that readable symbolic re-expression more densely preserves entity content compared to natural language or coherent summarization within the same token budget.

Key takeaway

For Machine Learning Engineers optimizing context windows in small language models for multi-hop question answering, you should investigate symbolic re-expression techniques. Implementing a structured format like Telegraph English, which converts passages into entity-relation statements, can yield substantial performance improvements, potentially boosting F1 scores by 13-20 percentage points. This approach offers a more token-efficient way to preserve critical reasoning evidence than traditional summarization or truncation methods.

Key insights

Telegraph English's symbolic re-expression significantly improves context compression for multi-hop QA over natural language at matched token budgets.

Principles

Method

Telegraph English rewrites retrieved passages into structured entity-relation statements, creating a readable symbolic format that preserves reasoning evidence at lower token cost for multi-hop QA.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.