Beyond the PDF: Rowan Cockett on Reproducible, Composable Science

2026-03-22 · Source: Data Engineering Podcast · Field: Technology & Digital — Data Science & Analytics, Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Advanced, extended

Summary

Rowan Cockett, co-founder and CEO of CurveNote and co-founder of the Continuous Science Foundation, discusses building data systems to enhance scientific research reproducibility, reusability, and communication. He addresses the socio-technical roots of the reproducibility crisis, including data integrity, access issues, entrenched publishing incentives, and PDF-centric workflows. Cockett highlights open standards and tools like Jupyter, Jupyter Book, and cloud-optimized formats such as Zarr, alongside graceful degradation strategies for interactive research. He details how CurveNote facilitates interactive, reproducible articles with on-demand compute and delegated large dataset storage, while community efforts like the Continuous Science Foundation and Creative Commons initiatives aim to improve credit, licensing, and attribution. He also mentions the Open Exchange Architecture (OXA) initiative for a modular, computational standard for sharing science, emphasizing interoperability and composability across data, code, and narrative for true progress.

Key takeaway

For AI Scientists aiming to enhance research impact and collaboration, prioritize adopting computational narrative tools and open standards. Your work can move beyond static PDFs by leveraging platforms that integrate data, code, and narrative, enabling interactive and reproducible results. This approach not only improves discoverability and reuse but also aligns with evolving academic credit systems, accelerating scientific progress by making your research composable and easily verifiable by peers.

Key insights

Reproducibility in science requires integrating data, code, and narrative through open standards and modular systems.

Principles

Scientific integrity relies on data transparency and processing pipelines.
Incentivize early sharing and proper attribution for academic credit.
Design research systems for graceful degradation over time.

Method

CurveNote enables interactive, reproducible articles by spinning up cloud-based Jupyter servers on demand, connecting to environments, and reproducing results, while delegating large dataset storage to specialized partners.

In practice

Utilize Jupyter Notebooks for literate programming style.
Adopt cloud-optimized data formats like Zarr.
Explore tools like CurveNote for interactive article publishing.

Topics

Scientific Reproducibility
Data Management
Open Science Standards
Computational Narratives
Jupyter Ecosystem

Best for: AI Scientist, Research Scientist, Data Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Data Engineering Podcast.