Beyond the PDF: Rowan Cockett on Reproducible, Composable Science
Summary
Rowan Cockett, co-founder and CEO of CurveNote and co-founder of the Continuous Science Foundation, discusses building data systems to enhance scientific research reproducibility, reusability, and communication. He addresses the socio-technical roots of the reproducibility crisis, including data integrity, access issues, entrenched publishing incentives, and PDF-centric workflows. Cockett highlights open standards and tools like Jupyter, Jupyter Book, and cloud-optimized formats such as Zarr, alongside graceful degradation strategies for interactive research. He details how CurveNote facilitates interactive, reproducible articles with on-demand compute and delegated large dataset storage, while community efforts like the Continuous Science Foundation and Creative Commons initiatives aim to improve credit, licensing, and attribution. He also mentions the Open Exchange Architecture (OXA) initiative for a modular, computational standard for sharing science, emphasizing interoperability and composability across data, code, and narrative for true progress.
Key takeaway
For AI Scientists aiming to enhance research impact and collaboration, prioritize adopting computational narrative tools and open standards. Your work can move beyond static PDFs by leveraging platforms that integrate data, code, and narrative, enabling interactive and reproducible results. This approach not only improves discoverability and reuse but also aligns with evolving academic credit systems, accelerating scientific progress by making your research composable and easily verifiable by peers.
Key insights
Reproducibility in science requires integrating data, code, and narrative through open standards and modular systems.
Principles
- Scientific integrity relies on data transparency and processing pipelines.
- Incentivize early sharing and proper attribution for academic credit.
- Design research systems for graceful degradation over time.
Method
CurveNote enables interactive, reproducible articles by spinning up cloud-based Jupyter servers on demand, connecting to environments, and reproducing results, while delegating large dataset storage to specialized partners.
In practice
- Utilize Jupyter Notebooks for literate programming style.
- Adopt cloud-optimized data formats like Zarr.
- Explore tools like CurveNote for interactive article publishing.
Topics
- Scientific Reproducibility
- Data Management
- Open Science Standards
- Computational Narratives
- Jupyter Ecosystem
Best for: AI Scientist, Research Scientist, Data Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Data Engineering Podcast.