Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

2025-05-13 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

CoT-PoT ensembling is a novel hybrid approach that significantly enhances the accuracy and efficiency of Large Language Model (LLM) reasoning by combining Chain-of-Thought (CoT) and Program-of-Thought (PoT) modalities. This method drastically reduces the number of samples required for self-consistency (SC) by a factor of 9.3x, enabling 78.6% of tasks to be solved with only two samples, a capability unmatched by prior SC techniques. The approach leverages the complementary strengths and diverse error modes of CoT (natural language step-by-step) and PoT (symbolic program execution) reasoning. It includes both full sampling strategies, which show an overall average accuracy increase of 1.1% over CoT-only and PoT-only SC, and early-stopping strategies based on a Bayesian model of cross-modal agreement.

Key takeaway

For Machine Learning Engineers optimizing LLM inference, consider implementing CoT-PoT ensembling to achieve substantial efficiency gains without sacrificing accuracy. Your teams can reduce sampling costs by over 9x, solving most problems with just two samples, which is critical for scaling complex reasoning tasks. Explore data-driven strategies for specific models or domains, or use data-independent methods like CPAA for aggressive efficiency, especially with powerful models like DeepSeek R1.

Key insights

CoT-PoT ensembling improves LLM reasoning accuracy and efficiency by leveraging diverse, complementary reasoning modalities.

Principles

Diversity of reasoning paths, not just quantity, improves self-consistency.
CoT and PoT modalities exhibit complementary strengths and different error modes.
Cross-modal agreement provides a strong signal for early stopping in LLM inference.

Method

Alternately sample one CoT and one PoT solution, applying a Bayesian model of cross-modal agreement to determine early stopping when confidence surpasses a 0.975 threshold.

In practice

Use CoT-PoT ensembling to reduce LLM inference costs by 9.3x.
Employ two-sample cross-modal consistency for 78.6% of reasoning tasks.
Bootstrapping PoT from CoT can enhance smaller models' reasoning capabilities.

Topics

LLM Reasoning
Self-Consistency
Chain-of-Thought
Program-of-Thought
Ensembling
Early Stopping

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.