Synthetic Contrastive Reasoning for Multi-Table Q&A
Summary
A new approach, Synthetic Contrastive Reasoning, addresses the lack of reasoning supervision in multi-table question answering (Q&A) by creating a novel dataset of contrastive reasoning traces. Researchers generated validated positive traces using GPT-4o and plausible, flawed negative traces with Gemini 2.0 Flash, deliberately employing heterogeneous LLMs to enhance the contrastive signal. These preference pairs were then used to fine-tune open-weight LLMs, including Qwen3-14B, Mistral-8B, and Llama-3.1-8B, via Contrastive Preference Optimization (CPO). This method yielded significant performance gains, with absolute average improvements over standard Q&A supervised fine-tuning ranging from 9.7% to 16.3%, and up to 21 percentage points on the MMQA benchmark. CPO also consistently outperformed Direct Preference Optimization (DPO) and the Table-R1 model, demonstrating improved generalization to three-table Q&A tasks. A new BIRD-derived evaluation set was also constructed, featuring full-table evidence and semantic verification.
Key takeaway
For Machine Learning Engineers developing LLMs for multi-table question answering, you should consider adopting Contrastive Preference Optimization (CPO) with synthetic reasoning traces. This method, which leverages both correct and plausibly incorrect reasoning paths generated by heterogeneous LLMs, consistently outperforms standard supervised fine-tuning and DPO. Implementing CPO can significantly improve your model's accuracy and generalization on complex, multi-hop reasoning over structured data, even for tasks involving more tables than seen during training.
Key insights
Synthetic contrastive reasoning traces and CPO significantly boost multi-table Q&A performance in LLMs.
Principles
- Explicit reasoning traces improve complex multi-step problem solving.
- Diverse LLM generators for positive/negative traces enhance contrastive signal.
- Learning from plausible errors strengthens model's reasoning robustness.
Method
Generate positive reasoning traces with GPT-4o and contrastive negative traces with Gemini 2.0 Flash, then apply CPO for fine-tuning.
In practice
- Implement CPO for multi-table Q&A tasks.
- Use different LLMs for generating positive and negative reasoning examples.
- Employ LLM-as-judge for rigorous trace validation and filtering.
Topics
- Multi-table Q&A
- Contrastive Preference Optimization
- LLM Fine-tuning
- Synthetic Reasoning Traces
- Structured Data Reasoning
- MMQA Benchmark
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.