\textsc{DiARC}: Distinguishing Positive and Negative Samples Helps Improving ARC-like Reasoning Ability of Large Language Models
Summary
DiARC is a novel method designed to enhance the Abstraction and Reasoning Corpus (ARC)-like reasoning capabilities of large language models (LLMs). Addressing the limitations of current LLM approaches, which are either unsatisfactory for open-source models or costly for closed-source ones, DiARC moves beyond traditional data augmentation. It posits that improving ARC-like problem-solving requires not only positive sample supervision but also the ability to distinguish negative samples. Drawing inspiration from preference alignment, DiARC constructs preference pairs. It introduces three specific techniques for generating negative samples: output-level visual transformations, DSL-level rule inversion, and task-specific rule editing. These methods create informative "near-miss" alternatives while preserving original demonstrations. Experimental results demonstrate that DiARC consistently improves performance across various ARC-like benchmarks. The project's code is publicly available at https://github.com/szu-tera/DiARC.
Key takeaway
For Research Scientists developing LLMs for complex reasoning tasks like those in the Abstraction and Reasoning Corpus, you should integrate negative sample distinction into your training methodology. Current data augmentation alone is insufficient. Instead, explore preference alignment techniques to enable your models to differentiate between correct and "near-miss" incorrect outputs. Implement output-level visual transformations, DSL-level rule inversion, or task-specific rule editing to construct informative negative samples. This approach consistently improves model performance.
Key insights
Distinguishing negative samples via preference alignment improves LLM reasoning on ARC-like tasks.
Principles
- Reasoning improvement requires distinguishing negative samples.
- Preference alignment can enhance model reasoning.
Method
DiARC constructs preference pairs by generating negative samples through output-level visual transformations, DSL-level rule inversion, or task-specific rule editing.
In practice
- Apply preference alignment to improve LLM reasoning.
- Generate near-miss negative samples for complex tasks.
Topics
- Large Language Models
- ARC Reasoning
- Preference Alignment
- Negative Sampling
- Data Augmentation
Code references
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.