Stanford AI Lab Papers and Talks at ICLR 2022

2022-04-25 · Source: The Stanford AI Lab Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Data Science & Analytics · Depth: Advanced, medium

Summary

Stanford AI Lab (SAIL) researchers presented 19 papers at the International Conference on Learning Representations (ICLR) 2022, held virtually from April 25th to April 29th. Key contributions include "Autonomous Reinforcement Learning: Formalism and Benchmarking," which introduces a new paradigm for continual learning, and "MetaShift," a dataset for evaluating contextual distribution shifts. Other notable works explore in-context learning in large language models like GPT-3, propose GreaseLM for knowledge graph-enhanced question answering, and investigate efficient model editing techniques. Robotics research is represented by papers on vision-based manipulators and learning inter-object functional relationships in 3D scenes. Additionally, research covers sparse training for neural networks, efficient long sequence modeling with structured state spaces, and methods for assessing machine learning API shifts.

Key takeaway

For NLP engineers and research scientists working with large language models, understanding that in-context learning arises from implicit Bayesian inference on long-range coherent pre-training data is crucial. You should prioritize pre-training distributions that foster this coherence and be mindful of how prompt formatting and example ordering can impact few-shot performance, even leading to zero-shot outperforming one-shot in some cases.

Key insights

In-context learning in large language models can emerge from modeling long-range coherence in pre-training data.

Principles

In-context learning involves implicit Bayesian inference.
Model scaling benefits in-context learning accuracy.
Pre-training distribution is key for in-context learning emergence.

Method

A proposed pre-training distribution, where documents are conditioned on a latent concept (like an HMM's hidden state transition matrix), enables in-context learning to emerge.

In practice

Use the Jync dataset to study in-context learning.
Consider latent concept structures in pre-training data.
Be aware of example ordering sensitivity in few-shot prompts.

Topics

Reinforcement Learning
Language Models
Distribution Shift
Robotics
Sparse Training

Code references

Best for: NLP Engineer, Research Scientist, AI Researcher, AI Scientist, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Stanford AI Lab Blog.