Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier

2026-06-15 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new semi-supervised framework scales Large Language Model (LLM) reasoning from minimal supervision by transforming reasoning verification into a data creation mechanism. This approach trains a lightweight reasoning-correctness classifier using only a few labeled samples to determine the validity of intermediate reasoning traces generated by an LLM. An entropy-based confidence threshold then filters out unreliable samples, with the remaining high-confidence reasoning traces used to fine-tune the model. Experiments on Verifiable Math Problems (Orca-Math subset) and Question Answering on Image Scene Graphs (GQA) with Visual Programming demonstrate that this method achieves accuracy comparable to systems utilizing 10-15x more labeled data. Ablation analyses confirm the critical roles of both the classifier and entropy filtering in enabling scalable, noise-resistant pseudo-labeling. This framework offers a practical path for constructing large-scale reasoning resources, moving towards autonomous reasoning systems with minimal human input.

Key takeaway

For Machine Learning Engineers developing LLMs with scarce labeled data, consider implementing a semi-supervised framework that leverages lightweight reasoning verifiers. This approach allows you to achieve high reasoning accuracy comparable to 10-15x more data by turning verification into a data creation mechanism. You can significantly reduce expensive answer-level supervision, accelerating the development of robust reasoning capabilities and paving the way for more autonomous systems.

Key insights

A semi-supervised framework scales LLM reasoning by using a lightweight verifier to generate high-confidence pseudo-labels from minimal human input.

Principles

Reasoning verification can create data.
Entropy filtering enhances pseudo-labeling.
Lightweight verifiers reduce supervision cost.

Method

Train a lightweight classifier on few labels to verify LLM reasoning traces. Filter high-confidence traces using an entropy threshold. Fine-tune the LLM with these pseudo-labels.

In practice

Construct large-scale reasoning resources.
Develop autonomous reasoning systems.
Reduce data labeling costs for LLM training.

Topics

Large Language Models
Semi-Supervised Learning
Reasoning Verification
Pseudo-Labeling
Data Efficiency
Orca-Math

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.