\textsc{DiARC}: Distinguishing Positive and Negative Samples Helps Improving ARC-like Reasoning Ability of Large Language Models

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

DiARC is a novel method designed to enhance the Abstraction and Reasoning Corpus (ARC)-like reasoning capabilities of large language models (LLMs). Addressing the limitations of current LLM approaches, which are either unsatisfactory for open-source models or costly for closed-source ones, DiARC moves beyond traditional data augmentation. It posits that improving ARC-like problem-solving requires not only positive sample supervision but also the ability to distinguish negative samples. Drawing inspiration from preference alignment, DiARC constructs preference pairs. It introduces three specific techniques for generating negative samples: output-level visual transformations, DSL-level rule inversion, and task-specific rule editing. These methods create informative "near-miss" alternatives while preserving original demonstrations. Experimental results demonstrate that DiARC consistently improves performance across various ARC-like benchmarks. The project's code is publicly available at https://github.com/szu-tera/DiARC.

Key takeaway

For Research Scientists developing LLMs for complex reasoning tasks like those in the Abstraction and Reasoning Corpus, you should integrate negative sample distinction into your training methodology. Current data augmentation alone is insufficient. Instead, explore preference alignment techniques to enable your models to differentiate between correct and "near-miss" incorrect outputs. Implement output-level visual transformations, DSL-level rule inversion, or task-specific rule editing to construct informative negative samples. This approach consistently improves model performance.

Key insights

Distinguishing negative samples via preference alignment improves LLM reasoning on ARC-like tasks.

Principles

Method

DiARC constructs preference pairs by generating negative samples through output-level visual transformations, DSL-level rule inversion, or task-specific rule editing.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.