Are LLMs safe?

· Source: NLP Highlights · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

Sachin Gururangan, a Young Investigator at Allen Institute for Artificial Intelligence and Data Science Engineer at Appuri, discussed sociotechnical approaches for training Large Language Models (LLMs) on the NLP Highlights podcast. He highlighted critical issues with current LLM training methods, which often involve pre-training transformer models on vast, indiscriminately collected internet corpora. Gururangan's research, including his PhD work at the University of Washington, emphasizes understanding the relationship between LLM behavior and training data, advocating for greater attention to language variation and post-training customization. He specifically detailed how "quality filters" used in datasets like GPT-3's can inadvertently introduce biases, over-representing content from well-resourced, urban, and wealthy areas, while implicitly disfavoring content from rural or less-resourced regions. This leads to models that are not truly "general purpose" but rather constrained by the implicit ideologies of their curators.

Key takeaway

For AI Scientists and Research Scientists developing or deploying LLMs, recognize that current training practices embed biases through data curation. Prioritize customization and adaptation strategies, such as multi-stage adaptive pre-training or task arithmetic, to tailor models for specific domains and mitigate unintended biases. Focus on curating high-quality, domain-relevant data, as this is key to efficient scaling and achieving desired model capabilities, even for high-resource teams.

Key insights

LLM training data curation, especially via quality filters, embeds implicit biases that shape model behavior and capabilities.

Principles

Method

Adaptive pre-training involves multi-stage adaptation, first to a broad domain, then to increasingly specific task data. Task arithmetic allows merging or interpolating "task vectors" (weight differences) to compose new model behaviors like non-toxic chat.

In practice

Topics

Best for: AI Scientist, Research Scientist, CTO, AI Researcher, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NLP Highlights.