Towards Generalist Agents for Accelerating Scientific Discovery171
Summary
Yi, a finishing PhD student from Cornell joining Microsoft Research, presented his work on leveraging large language models (LLMs) and agentic systems for scientific discovery. He highlighted the limitations of traditional LLM evaluation methods for scientific problems, which often rely on qualitative expert assessments or competition exams, and proposed an alternative focused on extracting non-trivial scientific hypotheses. His approach integrates LLMs as heuristic proposers within evolutionary algorithms to optimize small molecules and generate crystal structures, demonstrating superior performance compared to specialized models. Yi also introduced SAGA, an agentic system designed to automate parts of the scientific workflow, featuring a Planner, Implementer, Optimizer, and Analyzer, which iteratively refines objectives and preferences for tasks like antibiotic and inorganic material design. The system operates in co-pilot, semi-pilot, and autopilot modes, aiming to augment scientists rather than fully replace them.
Key takeaway
For AI Researchers and Research Scientists developing tools for scientific discovery, consider integrating LLMs into iterative, agentic workflows that prioritize hypothesis generation and refinement over simple QA. Your focus should be on building systems like SAGA that allow for dynamic adjustment of objectives and preferences, enabling more robust and diverse discovery, rather than solely pursuing fully autonomous AI scientists. This approach can significantly enhance the efficiency of tasks like drug and material design.
Key insights
LLMs can accelerate scientific discovery by generating and refining hypotheses within agentic, iterative workflows.
Principles
- Scientific knowledge emerges strongly in large models.
- Search and sampling are effective for knowledge extraction.
- Automation should augment, not replace, scientists.
Method
SAGA employs a Planner to define objectives, an Implementer to code functions, an Optimizer to find solutions, and an Analyzer to provide feedback, iteratively refining the discovery process.
In practice
- Use LLMs as proposers in evolutionary algorithms for molecular optimization.
- Implement agentic systems to automate objective definition and refinement.
- Consider multi-objective optimization with dynamic preference adjustment.
Topics
- Scientific Discovery Automation
- Large Language Models
- Agentic AI Systems
- Evolutionary Algorithms
- Materials Design
Best for: AI Researcher, AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Ai2.