Reproducibility and Open Science

Author
Published

May 24, 2026

Reproducibility in AI-biology has two linked meanings: computational reproducibility and experimental reproducibility. A notebook that reruns is not enough if the assay cannot be repeated.

Learning Objectives
  • Define reproducibility across code, data, model, and experiment.
  • Use open science practices without ignoring dual-use and privacy concerns.
  • Document enough context for independent review.
TL;DR

Open code and open weights help, but life sciences reproducibility also needs protocol detail, reagent provenance, data versioning, and assay records. Scientific openness and risk management must be handled together.

Introduction

Bridge2AI and IGoR both point toward data and protocol infrastructure as prerequisites for AI-supported science (NIH Common Fund Bridge2AI, 2026; ARPA-H IGoR, 2026). Open models such as Boltz-1 also show the pressure for transparent biomolecular modeling stacks (Boltz-1, 2024).

Demonstrated

Demonstrated capability includes public datasets, open-source model releases, protocol repositories, and reproducible benchmark scripts. AlphaFold DB and the ESM Metagenomic Atlas demonstrate the research value of large public predicted-structure resources (AlphaFold Protein Structure Database, 2026; ESM Metagenomic Atlas, 2026).

Evidence Anchor What It Supports Practical Constraint
AlphaFold DB Public predicted protein structure resource Predictions require confidence-aware use
Boltz-1 Open biomolecular interaction model ecosystem Open release does not replace independent validation
Bridge2AI Standards, tools, and training for AI-ready data Ethics and quality are reproducibility requirements

Theoretical

Theoretical capability includes reproducible research networks where protocols, agents, data, and instruments share interoperable records. That goal requires common schemas, incentives, and institutional support.

Beyond Current Capabilities

Beyond current capabilities includes fully reproducible biology from computational artifacts alone. Physical experiments require materials, instruments, environments, and local expertise.

Practice Notes

  • Release code, model version, data version, and filtering logic when possible.
  • Archive protocol details, reagent lots, instrument settings, and failed runs.
  • Use model cards and dataset cards for external readers.
  • Use access controls when openness conflicts with privacy, contracts, or safety.