Reproducibility and Open Science
Reproducibility in AI-biology has two linked meanings: computational reproducibility and experimental reproducibility. A notebook that reruns is not enough if the assay cannot be repeated.
- Define reproducibility across code, data, model, and experiment.
- Use open science practices without ignoring dual-use and privacy concerns.
- Document enough context for independent review.
Open code and open weights help, but life sciences reproducibility also needs protocol detail, reagent provenance, data versioning, and assay records. Scientific openness and risk management must be handled together.
Introduction
Bridge2AI and IGoR both point toward data and protocol infrastructure as prerequisites for AI-supported science (NIH Common Fund Bridge2AI, 2026; ARPA-H IGoR, 2026). Open models such as Boltz-1 also show the pressure for transparent biomolecular modeling stacks (Boltz-1, 2024).
Demonstrated
Demonstrated capability includes public datasets, open-source model releases, protocol repositories, and reproducible benchmark scripts. AlphaFold DB and the ESM Metagenomic Atlas demonstrate the research value of large public predicted-structure resources (AlphaFold Protein Structure Database, 2026; ESM Metagenomic Atlas, 2026).
| Evidence Anchor | What It Supports | Practical Constraint |
|---|---|---|
| AlphaFold DB | Public predicted protein structure resource | Predictions require confidence-aware use |
| Boltz-1 | Open biomolecular interaction model ecosystem | Open release does not replace independent validation |
| Bridge2AI | Standards, tools, and training for AI-ready data | Ethics and quality are reproducibility requirements |
Theoretical
Theoretical capability includes reproducible research networks where protocols, agents, data, and instruments share interoperable records. That goal requires common schemas, incentives, and institutional support.
Beyond Current Capabilities
Beyond current capabilities includes fully reproducible biology from computational artifacts alone. Physical experiments require materials, instruments, environments, and local expertise.
Practice Notes
- Release code, model version, data version, and filtering logic when possible.
- Archive protocol details, reagent lots, instrument settings, and failed runs.
- Use model cards and dataset cards for external readers.
- Use access controls when openness conflicts with privacy, contracts, or safety.