Towards Generalist Agents for Accelerating Scientific Discovery171

2026-03-03 · Source: Ai2 · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, AI for Scientific Discovery · Depth: Expert, extended

Summary

Yi, a finishing PhD student from Cornell joining Microsoft Research, presented his work on leveraging large language models (LLMs) and agentic systems for scientific discovery. He highlighted the limitations of traditional LLM evaluation methods for scientific problems, which often rely on qualitative expert assessments or competition exams, and proposed an alternative focused on extracting non-trivial scientific hypotheses. His approach integrates LLMs as heuristic proposers within evolutionary algorithms to optimize small molecules and generate crystal structures, demonstrating superior performance compared to specialized models. Yi also introduced SAGA, an agentic system designed to automate parts of the scientific workflow, featuring a Planner, Implementer, Optimizer, and Analyzer, which iteratively refines objectives and preferences for tasks like antibiotic and inorganic material design. The system operates in co-pilot, semi-pilot, and autopilot modes, aiming to augment scientists rather than fully replace them.

Key takeaway

For AI Researchers and Research Scientists developing tools for scientific discovery, consider integrating LLMs into iterative, agentic workflows that prioritize hypothesis generation and refinement over simple QA. Your focus should be on building systems like SAGA that allow for dynamic adjustment of objectives and preferences, enabling more robust and diverse discovery, rather than solely pursuing fully autonomous AI scientists. This approach can significantly enhance the efficiency of tasks like drug and material design.

Key insights

LLMs can accelerate scientific discovery by generating and refining hypotheses within agentic, iterative workflows.

Principles

Scientific knowledge emerges strongly in large models.
Search and sampling are effective for knowledge extraction.
Automation should augment, not replace, scientists.

Method

SAGA employs a Planner to define objectives, an Implementer to code functions, an Optimizer to find solutions, and an Analyzer to provide feedback, iteratively refining the discovery process.

In practice

Use LLMs as proposers in evolutionary algorithms for molecular optimization.
Implement agentic systems to automate objective definition and refinement.
Consider multi-objective optimization with dynamic preference adjustment.

Topics

Scientific Discovery Automation
Large Language Models
Agentic AI Systems
Evolutionary Algorithms
Materials Design

Best for: AI Researcher, AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Ai2.