Rethinking Pre-Training for Agentic AI [Aakanksha Chowdhery] - 759
Summary
Aakanksha Chowdhery, a member of technical staff at Reflection and a key contributor to large language models like PaLM and Gemini, discusses the need to fundamentally rethink pre-training for agentic AI. She argues that current pre-training, measured on static benchmarks, limits models' capabilities for interactive, goal-oriented tasks such as coding and deep research agents. Chowdhery emphasizes that achieving advanced agentic behaviors like planning, long-form reasoning, tool use, and recovery from failure requires evolving the attention mechanism, optimizing loss objectives beyond next-token prediction, and curating high-quality, diverse training data that includes expert reasoning traces. Reflection, her company, is focused on building frontier open agentic models, encompassing both pre-training and post-training, to address these challenges and develop more capable and cost-efficient AI.
Key takeaway
For AI Scientists and Research Scientists focused on developing next-generation agentic AI, you should prioritize foundational changes in pre-training rather than solely relying on post-training fine-tuning. Your efforts should concentrate on designing training data that includes multi-step reasoning traces and optimizing loss functions to explicitly teach planning, tool use, and error recovery, as these capabilities are crucial for robust agent performance and cannot be fully "tacked on" later.
Key insights
Agentic AI requires a fundamental shift in pre-training, moving beyond static benchmarks to foster interactive capabilities.
Principles
- Scale magnifies all problems in pre-training.
- Pre-training changes are high-risk, high-capital bets.
- Emergent capabilities often appear first in larger models.
Method
Rethink pre-training by evolving attention mechanisms for long-form reasoning, optimizing loss objectives with masking and data augmentation, and curating high-quality, diverse training data with expert reasoning traces.
In practice
- Develop in-house benchmarks for specific agentic capabilities.
- Explore unified domain-specific languages for tool learning.
- Focus on generating high-quality reasoning traces for training data.
Topics
- Agentic AI
- LLM Pre-training
- Long-form Reasoning
- Training Data Curation
- AI Benchmarking
Best for: AI Scientist, Research Scientist, AI Researcher, AI Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The TWIML AI Podcast with Sam Charrington.