Struggling to reproduce paper results before improving them — stuck below reported accuracy [R]

2026-05-05 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Advanced, medium

Summary

A PhD student in AI/computer vision is struggling to reproduce the reported accuracy of a published paper, consistently achieving around 73% accuracy compared to the paper's claimed 77%. Despite meticulously checking implementation details, preprocessing, hyperparameters, random seeds, and evaluation protocols, and attempting to contact the authors without success, the student cannot match the baseline. This reproducibility gap is identified as a common issue in machine learning and computer vision research, often stemming from undisclosed implementation details, specific data splits, or even "cheated" numbers. The discussion highlights that many published findings might be more "suggestions" than fully reproducible results, even when code is provided.

Key takeaway

For AI scientists and PhD students aiming to build upon published work, if you encounter significant reproducibility gaps, document your efforts thoroughly. Report your own consistently achieved baseline and frame your improvements relative to it. Focus on making your own code and environment fully reproducible to contribute to scientific transparency, rather than getting stuck chasing potentially unachievable or undocumented results from others.

Key insights

Reproducibility gaps are common in ML research due to missing details or undisclosed methods.

Principles

Report your own reproducible baseline.
Publish code with container environments.
Reviewers prioritize beating your own baseline.

Method

When facing reproducibility issues, try varying model initialization seeds, train/val/test splits, and matching ML library versions to the publication year. Intentionally overfit a small subset to diagnose if the divergence is architectural or data/evaluation related.

In practice

Include both stated and experimental baselines.
Make your own research code easily runnable.
Test with different PyTorch/ML library versions.

Topics

Reproducibility Gap
Machine Learning Research
Computer Vision
Baseline Accuracy
Hyperparameter Tuning

Best for: AI Student, AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.