When Symbol Names Should Not Matter: A Logistic Theory of Fresh-Symbol Classification

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

This paper introduces a logistic theory for fresh-symbol classification in transformers, addressing how these models reason with abstract symbols rather than concrete token names. The research focuses on fixed-label classification where training and test examples share latent templates but may use disjoint vocabularies, meaning the model must learn decision rules invariant to symbol renaming. The core contribution is a decomposition of the learned predictor into an ideal template-level classifier and a finite-sample perturbation caused by accidental token overlaps in training data. This perturbation is quantified using a "colored collision graph," which tracks collision geometry and signed kernel weights, proving high-probability margin-transfer guarantees for fresh-symbol classification. The analysis extends template-based methods to logistic classification and refines scalar diversity conditions, showing that vocabulary size controls average collision rates, but collision geometry dictates whether the ideal classification margin is preserved. Synthetic experiments illustrate the roles of regularization, sample size, and transformer-kernel structure.

Key takeaway

For AI Scientists and Research Scientists developing or evaluating transformer models for symbolic reasoning, understanding the "colored collision graph" is crucial. Your models' ability to generalize to unseen symbols is not solely dependent on token diversity or vocabulary size, but critically on the geometric properties of accidental token overlaps during training. You should analyze these collision geometries to ensure that the ideal classification margin is preserved, especially when designing abstraction-based reasoning methods, as their benefit is contingent on increasing the template-level margin more than the additional collision interactions they introduce.

Key insights

Fresh-symbol generalization in transformers depends on collision graph geometry, not just token diversity.

Principles

Method

The paper analyzes regularized kernel logistic classification in the transformer-kernel regime, decomposing predictors into ideal template classifiers and finite-sample perturbations. It uses a colored collision graph to encode token overlaps and prove margin-transfer guarantees.

In practice

Topics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.