Meet an AI Using Scholar

2026-02-14 · Source: Ai2 · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

Galen Adams, a senior research scientist at Brown University, discusses the application of machine learning and large language models (LLMs) to improve systematic review methods. She highlights the distinction between narrative reviews, which LLMs handle well, and systematic reviews, which require comprehensive evidence collection. Adams details the systematic review process, from defining a topic using the PICO framework to searching, screening, data extraction, and report writing. Her ongoing rapid review of AI tools for systematic reviews, covering 22 tools across 18 studies, reveals that LLMs are inconsistent and perform poorly in initial search phases, often prioritizing "perfect" articles over broad evidence. However, these tools show promise in abstract and full-text screening, especially when combining multiple models. Data extraction remains challenging for LLMs, particularly for nuanced result interpretation, though descriptive data extraction is improving. Adams also emphasizes the need for user-friendly performance measures like "number needed to read" over complex metrics like F1 scores.

Key takeaway

For AI Scientists developing tools for evidence synthesis, prioritize improving LLM capabilities for comprehensive literature searching, as current models often miss tangential but relevant studies. Focus on developing interactive, human-in-the-loop systems that can reliably reproduce included study lists from existing systematic reviews, particularly for well-defined datasets like Cochrane reviews. This approach will build trust among conservative systematic reviewers and address critical bottlenecks in screening and data checking, freeing up time for complex analysis and reporting.

Key insights

AI tools show promise for systematic review screening and descriptive data extraction, but struggle with comprehensive search and nuanced data interpretation.

Principles

Systematic reviews demand comprehensive evidence, not just highly relevant articles.
Combining multiple AI models can improve screening performance.
User-centric metrics like "number needed to read" are preferred over F-scores.

Method

The systematic review process involves defining a topic (PICO), searching, abstract screening, full-text screening, data extraction, risk of bias assessment, and report writing. AI tools can be integrated into specific steps like screening and descriptive data extraction.

In practice

Use AI for abstract and full-text screening to reduce workload.
Combine different AI models for improved screening accuracy.
Focus human effort on nuanced data extraction and result interpretation.

Topics

Systematic Reviews
Large Language Models
Evidence Synthesis
Information Retrieval
AI-Assisted Screening

Best for: AI Scientist, AI Researcher, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Ai2.