A Small-Scale System for Autoregressive Program Synthesis Enabling Controlled Experimentation

2026-02-13 · Source: Apple Machine Learning Research · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, quick

Summary

Cadmus is a small-scale system designed for autoregressive program synthesis, offering a controlled environment for research into program completion and inductive reasoning. It comprises an integer virtual machine, a diverse dataset of true programs, and an autoregressive transformer model trained for under $200. This system allows researchers to conduct experiments with fine-grained control over the training distribution and enables detailed model inspection, which is often cost-prohibitive with large language models (LLMs). Cadmus models achieved 100% accuracy on integer arithmetic program completion tasks in its domain-specific language, outperforming GPT-5, which scored 95%. The system also highlights how GPT-5 introduces unknown priors, complicating investigations where the training set's relationship to the task must be fully understood.

Key takeaway

For research scientists investigating program synthesis or inductive reasoning, Cadmus offers a cost-effective and transparent alternative to large language models. You can achieve high accuracy (100% on arithmetic tasks) while maintaining full control over the training data and model internals, avoiding the confounding factors of unknown priors inherent in larger, pre-trained models.

Key insights

Small, controlled program synthesis systems offer transparent research into model reasoning and training distribution effects.

Principles

Smaller models enable affordable, fine-grained experimental control.
Unknown priors in LLMs can confound research into training set relationships.

Method

Cadmus integrates an integer VM, a true program dataset, and a low-cost autoregressive transformer for controlled program synthesis research.

In practice

Use Cadmus for studying out-of-distribution representations.
Inspect model instrumentation on complex reasoning tasks.

Topics

Autoregressive Program Synthesis
Small Language Models
Transformer Architecture
Out-of-Distribution Generalization
Inductive Reasoning

Best for: Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Apple Machine Learning Research.