Beyond the PDF: Rowan Cockett on Reproducible, Composable Science

· Source: Data Engineering Podcast · Field: Technology & Digital — Data Science & Analytics, Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Advanced, extended

Summary

Rowan Cockett, co-founder and CEO of CurveNote and co-founder of the Continuous Science Foundation, discusses building data systems to enhance scientific research reproducibility, reusability, and communication. He addresses the socio-technical roots of the reproducibility crisis, including data integrity, access issues, entrenched publishing incentives, and PDF-centric workflows. Cockett highlights open standards and tools like Jupyter, Jupyter Book, and cloud-optimized formats such as Zarr, alongside graceful degradation strategies for interactive research. He details how CurveNote facilitates interactive, reproducible articles with on-demand compute and delegated large dataset storage, while community efforts like the Continuous Science Foundation and Creative Commons initiatives aim to improve credit, licensing, and attribution. He also mentions the Open Exchange Architecture (OXA) initiative for a modular, computational standard for sharing science, emphasizing interoperability and composability across data, code, and narrative for true progress.

Key takeaway

For AI Scientists aiming to enhance research impact and collaboration, prioritize adopting computational narrative tools and open standards. Your work can move beyond static PDFs by leveraging platforms that integrate data, code, and narrative, enabling interactive and reproducible results. This approach not only improves discoverability and reuse but also aligns with evolving academic credit systems, accelerating scientific progress by making your research composable and easily verifiable by peers.

Key insights

Reproducibility in science requires integrating data, code, and narrative through open standards and modular systems.

Principles

Method

CurveNote enables interactive, reproducible articles by spinning up cloud-based Jupyter servers on demand, connecting to environments, and reproducing results, while delegating large dataset storage to specialized partners.

In practice

Topics

Best for: AI Scientist, Research Scientist, Data Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Data Engineering Podcast.