Synthetic Contrastive Reasoning for Multi-Table Q&A

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A new approach, Synthetic Contrastive Reasoning, addresses the lack of reasoning supervision in multi-table question answering (Q&A) by creating a novel dataset of contrastive reasoning traces. Researchers generated validated positive traces using GPT-4o and plausible, flawed negative traces with Gemini 2.0 Flash, deliberately employing heterogeneous LLMs to enhance the contrastive signal. These preference pairs were then used to fine-tune open-weight LLMs, including Qwen3-14B, Mistral-8B, and Llama-3.1-8B, via Contrastive Preference Optimization (CPO). This method yielded significant performance gains, with absolute average improvements over standard Q&A supervised fine-tuning ranging from 9.7% to 16.3%, and up to 21 percentage points on the MMQA benchmark. CPO also consistently outperformed Direct Preference Optimization (DPO) and the Table-R1 model, demonstrating improved generalization to three-table Q&A tasks. A new BIRD-derived evaluation set was also constructed, featuring full-table evidence and semantic verification.

Key takeaway

For Machine Learning Engineers developing LLMs for multi-table question answering, you should consider adopting Contrastive Preference Optimization (CPO) with synthetic reasoning traces. This method, which leverages both correct and plausibly incorrect reasoning paths generated by heterogeneous LLMs, consistently outperforms standard supervised fine-tuning and DPO. Implementing CPO can significantly improve your model's accuracy and generalization on complex, multi-hop reasoning over structured data, even for tasks involving more tables than seen during training.

Key insights

Synthetic contrastive reasoning traces and CPO significantly boost multi-table Q&A performance in LLMs.

Principles

Explicit reasoning traces improve complex multi-step problem solving.
Diverse LLM generators for positive/negative traces enhance contrastive signal.
Learning from plausible errors strengthens model's reasoning robustness.

Method

Generate positive reasoning traces with GPT-4o and contrastive negative traces with Gemini 2.0 Flash, then apply CPO for fine-tuning.

In practice

Implement CPO for multi-table Q&A tasks.
Use different LLMs for generating positive and negative reasoning examples.
Employ LLM-as-judge for rigorous trace validation and filtering.

Topics

Multi-table Q&A
Contrastive Preference Optimization
LLM Fine-tuning
Synthetic Reasoning Traces
Structured Data Reasoning
MMQA Benchmark

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.