Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research
Summary
Elicit, led by Andreas Stuhlmüller and Jungwon Byun, is developing trusted reasoning workflows for scientific research, addressing the challenge of powerful yet opaque frontier AI models. Their approach integrates process supervision, domain-specific reasoning primitives, and inspectable "world models" to ensure reliable analysis of evidence, causality, and counterfactuals. Elicit's platform, which uses a domain-specific language to orchestrate agent calls, guarantees consistent application of reasoning processes across large datasets, serving seven of the top 20 life sciences companies for tasks like drug target ranking and regulatory defense. Internally, Elicit employs "The Line," an automated software engineering system delivering 30-50 code changes weekly. The company also explores external world models for continual learning and inspectable knowledge representations, while managing significant token costs (Andreas spends ~\$2,000/week) by dynamically dispatching tasks to appropriately sized models.
Key takeaway
For Research Scientists and Directors of AI/ML integrating AI into high-stakes scientific research, prioritize platforms that offer transparent, systematic reasoning over opaque "black box" outputs. Your teams should adopt tools like Elicit that implement process supervision and explicit world models, ensuring AI-generated conclusions are verifiable and consistently derived from evidence. This approach mitigates risks associated with models that are "easy to push around," fostering trust and improving the overall quality of decision-making.
Key insights
Elicit ensures trusted AI reasoning in scientific research via process supervision, domain-specific primitives, and inspectable world models.
Principles
- Process supervision validates AI reasoning steps, not just final answers.
- Evidence quality assessment should prioritize methodology over metadata.
- Explicit world models enable inspectable, continual AI learning.
Method
Elicit employs a domain-specific language (DSL) to orchestrate reasoning primitives, enabling frontier models to dynamically generate structured workflows guaranteed for systematic execution.
In practice
- Conduct systematic literature reviews with guaranteed process consistency across large datasets.
- Utilize AI for rigorous ranking of drug targets and justifying drug launch strategies.
- Automate software engineering for bug fixes and simple features via iterative AI workflows.
Topics
- AI for Science
- Reasoning Workflows
- Process Supervision
- World Models
- Life Sciences Research
- Automated Software Engineering
Code references
Best for: Executive, AI Architect, AI Product Manager, AI Scientist, Research Scientist, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Cognitive Revolution.