An Empirical Investigation of Pre-Trained Deep Learning Model Reuse in the Scientific Process
Summary
An empirical study investigated the reuse of pre-trained deep learning models (PTMs) in natural sciences, analyzing 17,511 peer-reviewed, open-access papers from January 1st, 2000, to December 10th, 2025, using an automated large language model pipeline. The research identified 631 papers reusing PTMs, finding that "Biochemistry, Genetics and Molecular Biology" leads other fields in adoption. "Adaptation" reuse, involving fine-tuning existing models, was the most prevalent pattern, accounting for 70.29% of instances. The study also revealed that PTM integration primarily impacts the "Test" stage of the scientific process (422 instances), with models like AlexNet (25), Alphafold 2 (23), VGG 16 (19), ResNet 50 (18), and Alphafold (18) being most frequently reused. This work characterizes current PTM reuse practices and their impact on scientific workflows.
Key takeaway
For research scientists and AI engineers integrating deep learning into scientific discovery, you should prioritize pre-trained models, especially through "adaptation" reuse, to reduce computational overhead and accelerate data-driven insights. While PTMs currently excel in the "Test" phase, consider expanding their application to earlier stages like hypothesis generation or experimental design to unlock their full potential across the entire scientific workflow.
Key insights
Pre-trained model reuse, particularly adaptation, is increasingly common in natural sciences, predominantly augmenting the "Test" stage.
Principles
- PTMs significantly reduce deep learning training costs.
- "Adaptation" is the dominant PTM reuse pattern in natural sciences.
- PTMs function as reusable scientific infrastructure.
Method
An automated LLM-driven pipeline identifies deep learning usage, verifies PTM reuse, extracts specific PTMs, classifies reuse patterns (conceptual, adaptation, deployment), and maps their impact to scientific process stages.
In practice
- Adopt PTMs to mitigate deep learning computational costs.
- Focus on adaptation for domain-specific scientific tasks.
- Explore PTM integration in earlier scientific stages beyond testing.
Topics
- Deep Learning Model Reuse
- Pre-trained Models
- Scientific Workflows
- Empirical Software Engineering
- AI for Science
- Automated Literature Review
- Biochemistry
Code references
Best for: Research Scientist, AI Scientist, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.