Meta-Learning Transformers to Improve In-Context Generalization

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

The paper introduces GEOM, a meta-learned transformer architecture designed to enhance in-context generalization by training on multiple small, domain-specific datasets rather than single large, unstructured corpora. This approach addresses issues like data storage costs, quality evaluation, and privacy concerns associated with massive datasets. Empirically demonstrated using the Meta-Album collection, GEOM achieves comparable or superior performance to models trained on large-scale datasets, particularly in cross-domain scenarios. The research investigates GEOM's performance in supervised (offline), sequential (lifelong learning with curriculum strategies), and unsupervised settings. Findings highlight improved generalization, robustness to forgetting, and advantages in modularity, interpretability, and adaptability, even when trained on pseudo-labeled data.

Key takeaway

For Machine Learning Engineers developing in-context learning models, consider shifting from massive, uncurated datasets to collections of smaller, domain-specific ones. This strategy, exemplified by GEOM, can yield comparable or better cross-domain generalization while offering enhanced data control, modularity, and adaptability. Prioritize increasing class diversity over raw image count and explore curriculum learning, especially Hard-to-Easy or Easy-to-Easy sequencing, to improve sequential knowledge accumulation and mitigate forgetting.

Key insights

Training transformers on diverse small datasets improves in-context generalization and modularity over large, uncurated corpora.

Principles

Class diversity is more critical than image quantity for generalization.
Sequential learning with structured curricula enhances knowledge accumulation.
Models can generalize effectively even with pseudo-labeled data.

Method

GEOM reformulates meta-learning as a sequence modeling problem, feeding non-causal sequences of image-label pairs into a transformer encoder for query label prediction.

In practice

Curate small, domain-specific datasets for better control and adaptability.
Implement Hard-to-Easy (H2E) or Easy-to-Easy (E2E) curricula for sequential training.
Explore unsupervised meta-learning with data augmentation for unlabeled data.

Topics

In-Context Learning
Meta-Learning
Transformer Models
Dataset Curation
Cross-Domain Generalization
Curriculum Learning
Unsupervised Learning

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Student

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.