Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning
Summary
CoT-PoT ensembling is a novel hybrid approach that significantly enhances the accuracy and efficiency of Large Language Model (LLM) reasoning by combining Chain-of-Thought (CoT) and Program-of-Thought (PoT) modalities. This method drastically reduces the number of samples required for self-consistency (SC) by a factor of 9.3x, enabling 78.6% of tasks to be solved with only two samples, a capability unmatched by prior SC techniques. The approach leverages the complementary strengths and diverse error modes of CoT (natural language step-by-step) and PoT (symbolic program execution) reasoning. It includes both full sampling strategies, which show an overall average accuracy increase of 1.1% over CoT-only and PoT-only SC, and early-stopping strategies based on a Bayesian model of cross-modal agreement.
Key takeaway
For Machine Learning Engineers optimizing LLM inference, consider implementing CoT-PoT ensembling to achieve substantial efficiency gains without sacrificing accuracy. Your teams can reduce sampling costs by over 9x, solving most problems with just two samples, which is critical for scaling complex reasoning tasks. Explore data-driven strategies for specific models or domains, or use data-independent methods like CPAA for aggressive efficiency, especially with powerful models like DeepSeek R1.
Key insights
CoT-PoT ensembling improves LLM reasoning accuracy and efficiency by leveraging diverse, complementary reasoning modalities.
Principles
- Diversity of reasoning paths, not just quantity, improves self-consistency.
- CoT and PoT modalities exhibit complementary strengths and different error modes.
- Cross-modal agreement provides a strong signal for early stopping in LLM inference.
Method
Alternately sample one CoT and one PoT solution, applying a Bayesian model of cross-modal agreement to determine early stopping when confidence surpasses a 0.975 threshold.
In practice
- Use CoT-PoT ensembling to reduce LLM inference costs by 9.3x.
- Employ two-sample cross-modal consistency for 78.6% of reasoning tasks.
- Bootstrapping PoT from CoT can enhance smaller models' reasoning capabilities.
Topics
- LLM Reasoning
- Self-Consistency
- Chain-of-Thought
- Program-of-Thought
- Ensembling
- Early Stopping
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.