The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Advanced, extended

Summary

This paper investigates the "perplexity paradox" in Large Language Model (LLM) prompt compression, where code generation tasks tolerate aggressive compression (r ≥ 0.6) while chain-of-thought (CoT) reasoning tasks degrade gradually. The study validates this task-dependent compression hypothesis across six code benchmarks (HumanEval, MBPP, HumanEval+, MultiPL-E in Python/JavaScript/Java) and four reasoning benchmarks (GSM8K, MATH, ARC-Challenge, MMLU-STEM). A per-token perplexity analysis of 723 tokens reveals that code syntax tokens, which are unusual to LLMs, are preserved due to high perplexity, while numerical values in math problems, which follow predictable patterns, are pruned despite being task-critical. A signature preservation experiment demonstrated a +34 percentage point recovery in pass rate (5.3% baseline → 39.3%) by injecting function signatures. The research also introduces TAAC (Task-Aware Adaptive Compression), an algorithm that dynamically adjusts compression based on predicted quality degradation, achieving 22% cost reduction with 96% quality preservation, outperforming fixed-ratio compression by 7%.

Key takeaway

For AI Engineers and Research Scientists optimizing LLM inference costs, understanding the perplexity paradox is crucial. Your current prompt compression strategies might be inadvertently degrading performance on mathematical reasoning tasks by pruning critical numerical values. Implement task-aware compression algorithms like TAAC, which dynamically adjust compression ratios based on task type and predicted quality, to achieve better cost savings (up to 22%) while maintaining high quality (96% preservation). Prioritize preserving high-perplexity, task-critical tokens, especially function signatures in code, to avoid significant performance drops.

Key insights

LLM prompt compression algorithms misalign linguistic perplexity with task importance, preserving code syntax but pruning critical math numbers.

Principles

Method

TAAC classifies task type, estimates information density via perplexity coefficient of variation, and iteratively compresses with a quality predictor to ensure output quality remains above a user-defined floor.

In practice

Topics

Code references

Best for: AI Engineer, AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, Prompt Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.