An Empirical Investigation of Pre-Trained Deep Learning Model Reuse in the Scientific Process

2024-07-02 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Advanced, extended

Summary

An empirical study investigated the reuse of pre-trained deep learning models (PTMs) in natural sciences, analyzing 17,511 peer-reviewed, open-access papers from January 1st, 2000, to December 10th, 2025, using an automated large language model pipeline. The research identified 631 papers reusing PTMs, finding that "Biochemistry, Genetics and Molecular Biology" leads other fields in adoption. "Adaptation" reuse, involving fine-tuning existing models, was the most prevalent pattern, accounting for 70.29% of instances. The study also revealed that PTM integration primarily impacts the "Test" stage of the scientific process (422 instances), with models like AlexNet (25), Alphafold 2 (23), VGG 16 (19), ResNet 50 (18), and Alphafold (18) being most frequently reused. This work characterizes current PTM reuse practices and their impact on scientific workflows.

Key takeaway

For research scientists and AI engineers integrating deep learning into scientific discovery, you should prioritize pre-trained models, especially through "adaptation" reuse, to reduce computational overhead and accelerate data-driven insights. While PTMs currently excel in the "Test" phase, consider expanding their application to earlier stages like hypothesis generation or experimental design to unlock their full potential across the entire scientific workflow.

Key insights

Pre-trained model reuse, particularly adaptation, is increasingly common in natural sciences, predominantly augmenting the "Test" stage.

Principles

PTMs significantly reduce deep learning training costs.
"Adaptation" is the dominant PTM reuse pattern in natural sciences.
PTMs function as reusable scientific infrastructure.

Method

An automated LLM-driven pipeline identifies deep learning usage, verifies PTM reuse, extracts specific PTMs, classifies reuse patterns (conceptual, adaptation, deployment), and maps their impact to scientific process stages.

In practice

Adopt PTMs to mitigate deep learning computational costs.
Focus on adaptation for domain-specific scientific tasks.
Explore PTM integration in earlier scientific stages beyond testing.

Topics

Deep Learning Model Reuse
Pre-trained Models
Scientific Workflows
Empirical Software Engineering
AI for Science
Automated Literature Review
Biochemistry

Code references

Best for: Research Scientist, AI Scientist, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.