The Life Sciences AI Handbook: Steering Frontier Models in Biology

Tegomoh, Bryan

AI for the Life Sciences

Published

July 7, 2026

Life sciences AI is the application of machine learning systems to biological objects: sequences, structures, cells, tissues, organisms, experiments. Its center of gravity is upstream of clinical care: molecules generated before they are synthesized, cells represented before they are perturbed, experiments planned before they are run. AlphaFold 2 made over 214 million predicted structures freely available to researchers (Varadi et al., 2024). RFdiffusion designs proteins that bind, fold, and function in laboratory experiments (Watson et al., 2023). Coscientist plans and executes chemistry on cloud robotics with minimal human input (Boiko et al., 2023). None of this removes the experiment. The work is choosing the right one.

Learning Objectives

Use this chapter to:

Define life sciences AI as the discovery layer for molecules, cells, tissues, organisms, environments, experiments, and research systems.
Model output is treated as a hypothesis until a benchmark, experiment, or deployment context gives it weight.

Prerequisites: none for orientation. The rest of the handbook assumes you have read this chapter and the Executive Summary.

Chapter Summary (TL;DR)

Summary: Define life sciences AI as the discovery layer for molecules, cells, tissues, organisms, environments, experiments, and research systems. The field has mature anchors in structure prediction and protein design, faster-moving evidence in cells and genomes, and earlier claims in autonomous discovery.

Key point: Model output is treated as a hypothesis until a benchmark, experiment, or deployment context gives it weight. Open question: how often those systems improve experiments outside the tasks where they were built.

Bottom line: The chapter sets the grammar for the rest of the handbook: biological object, task, evidence, confidence, and downstream decision.

Field Guide

What is this field trying to solve? Define life sciences AI as the discovery layer for molecules, cells, tissues, organisms, environments, experiments, and research systems.

What is the core idea? Model output is treated as a hypothesis until a benchmark, experiment, or deployment context gives it weight.

What is the current state of the field? The field has mature anchors in structure prediction and protein design, faster-moving evidence in cells and genomes, and earlier claims in autonomous discovery.

What do we know, and what remains open? Known reference points include AlphaFold, RFdiffusion, Evo, scGPT, Geneformer, GEARS, Coscientist, Virtual Lab, CASP, PoseBusters, DOME, and biology-aware benchmark suites. What remains open is how often those systems improve experiments outside the tasks where they were built.

Why does this matter? The chapter sets the grammar for the rest of the handbook: biological object, task, evidence, confidence, and downstream decision.

Introduction: What This Handbook Is For

October 2024, Stockholm: The Royal Swedish Academy of Sciences awards the Nobel Prize in Chemistry jointly to Demis Hassabis and John Jumper for AlphaFold, and to David Baker for computational protein design. The prize recognizes a decade of work in which deep learning moved from a tool researchers tried, to a tool researchers depended on, to (in the Committee’s framing) a tool that changed what was possible in structural biology.

May 2024: AlphaFold 3 extends biomolecular prediction beyond proteins to nucleic acids, ions, and ligands (Abramson et al., 2024). The initial release is server-only, with restricted commercial use. The open-source response (Boltz, Chai) arrives within six months.

October 2025: Anthropic launches Claude for Life Sciences with integrations into Benchling, 10x Genomics, PubMed, and Synapse (Anthropic, October 2025). OpenAI’s o1 reaches 77–78% on GPQA Diamond, exceeding the 69.7% PhD-expert baseline on graduate-level biology, chemistry, and physics questions (OpenAI, September 2024).

May 2026: ARPA-H launches the IGoR program (Intelligent Generator of Research) to “deliver gold-standard biomedical science faster through an AI-powered research ecosystem” focused on Alzheimer’s, Parkinson’s, and autoimmune disease (HHS press release, May 2026).

June 2026: Anthropic launches Claude Science as a beta AI workbench for scientific research, describing Claude as an interface for literature search, data analysis, code execution, remote compute, and scientific software integrations (Anthropic, June 2026).

Together, these events changed the workflow of biological research: AI is no longer only an experiment to run, it is infrastructure to evaluate and use. The question is no longer “can I use AI?” but “which model, for which decision, with what validation?”

This handbook is written for those questions. It is not a tour of every model release. It is a framework for reading model outputs as research inputs, choosing systems by biological question rather than brand, and designing the experiment that turns a prediction into biological evidence.

What Life Sciences AI Is, and What It Is Not

The Discovery Layer

Life sciences AI sits in the discovery layer of biological research:

                     Population health
                            │
                            ▼
  Public Health AI ── Patient populations, surveillance, forecasting
                            │
                            ▼
                     Clinical practice
                            │
                            ▼
       Clinical AI ── Diagnosis, treatment, workflow, liability
                            │
                            ▼
                   Therapeutic development
                            │
                            ▼
   [LIFE SCIENCES AI] ── Molecules, cells, organisms, environments,
                          experiments, candidates
                            │
                            ▼
                       Fundamental biology

Each layer above depends on inputs from the layer below. A drug candidate exists before a clinical trial; a clinical trial exists before a regulatory decision; a regulatory decision exists before a population-scale deployment. Life sciences AI operates across molecules, cells, tissues, organisms, environments, and experiments. Its outputs can feed clinical, public health, biotechnology, agricultural, ecological, and biosecurity decisions.

The Three Adjacent Domains

Domain	Object of Study	Failure Cost	Evidence Standard
Life sciences AI	Molecules, cells, tissues, organisms, environments, experiments, research decisions	A failed experiment, a discontinued program, a non-validated paper	Benchmarks + prospective experimental validation
Clinical AI	Individual patients, diagnostic and therapeutic decisions	Misdiagnosis, mistreatment, liability	Prospective clinical trials, FDA clearance, real-world performance
Public health AI	Populations, surveillance, intervention design	Missed outbreaks, misallocated resources, eroded trust	Deployment-context evaluation, proper scoring rules, equity analysis

The three are complementary but distinct. A model that predicts a binding interaction is a life sciences AI claim. A model that recommends a treatment for a specific patient is a clinical AI claim. A model that forecasts hospitalizations is a public health AI claim. The same architectural family (transformers, diffusion, graph networks) underlies all three, but the evidence standard depends on the decision the model informs, not the math under the hood.

Companion handbooks in this series cover the clinical, population, and biosecurity layers explicitly. Links and descriptions appear on the welcome page under “Companion Handbooks.”

The Chapter Reference and Utility Standard

Every major chapter in this handbook starts with a field guide, then calibrates evidence. The reference layer answers:

What is this field trying to solve?
What is the core idea?
What is the current state of the field?
What do we know, and what remains open?
Why does this matter?

The evidence layer answers:

What is demonstrated?

Supported by published evidence in peer-reviewed venues, official documentation, or reproducible benchmark results. The evidence must be specific to a defined task and dataset. “AlphaFold 2 predicts single-chain protein structures at near-experimental accuracy for the majority of well-folded domains evaluated in CASP14” is a demonstrated claim (Jumper et al., 2021).

What is theoretical?

Plausible given current methods but not yet established for routine use. The capability has been shown in selected systems, narrow tasks, or controlled settings without proven generalization. “Single-cell foundation models can transfer to new tissues and species” is currently a theoretical claim: published evidence supports transfer in some settings (Cui et al., 2024; Theodoris et al., 2023), but the boundary of useful transfer is an open research question.

What is beyond current capability?

Not supported by credible evidence with current systems. The capability is either aspirational, has been demonstrated in toy settings that do not generalize, or requires evidence that has not been produced. “Fully autonomous drug discovery without experimental validation” is beyond current capabilities. Coscientist demonstrates autonomous chemistry execution in bounded settings (Boiko et al., 2023), not autonomous discovery without measurement.

What would make this more promising?

The answer should name the next measurement: a blinded benchmark, prospective wet-lab validation, independent reproduction, field study, external cohort, or decision-relevant endpoint. A claim that does not name the missing evidence is not ready for program decisions.

What should researchers, biotech teams, funders, and program leaders do with this?

The answer should translate evidence into action: evaluate, adopt narrowly, test experimentally, fund as infrastructure, fund as research, or defer. The handbook treats action as part of evidence reading because science programs spend money, time, samples, and credibility on these decisions.

The point of the questions is not to be conservative: it is to be specific. A claim that cannot answer them is not a claim about a system; it is marketing.

Reading Claims with the Framework

When a press release, a paper title, or a vendor pitch makes a capability claim, ask:

What biological object is the model representing? Sequence? Structure? Cell state? Tissue? Experiment? Reaction?
What is the specific task on which the claim is made? Property prediction? Generation? Ranking? Classification? Planning?
What is the evidence? Held-out benchmark performance? Prospective experiment? Cross-laboratory replication? Vendor-reported internal evaluation?
What would make this more promising? What evidence is still missing?
What should researchers, biotech teams, funders, and program leaders do with this? Evaluate, adopt narrowly, test, fund, or defer?

A claim that survives these questions is at minimum demonstrated for the specific task and dataset. A claim that cannot name the missing evidence is either theoretical or beyond current capability.

The Capability Landscape

The handbook organizes life sciences AI into six parts. Each part contains several chapters that apply the five-question standard to a specific model class.

Part I: Foundations (this part)

AI for the Life Sciences (this chapter): Scope and framework
History of AI in the Life Sciences: The five-decade arc from sequence alignment to foundation models
Biological Data Infrastructure: Bridge2AI, CZ CELLxGENE, Tabula Sapiens, and why AI-ready data is its own problem
Biomedical Knowledge Graphs and Literature AI: Evidence retrieval, graph reasoning, and literature-grounded claims
Foundation Models for Biology: Protein language models (ESM), single-cell foundation models (scGPT, Geneformer), genomic foundation models (Evo, Evo 2)
Evaluation Principles for Life Sciences AI: Held-out benchmarks, prospective validation, distribution shift, calibration

Part II: Molecular Discovery and Design

Protein Structure Prediction: AlphaFold lineage and confidence interpretation
Protein Design and Engineering: RFdiffusion, ProteinMPNN, the design-validate-iterate loop
Antibody and Biologic Design: Antibody-specific limitations and developability
Nucleic Acid and Genome Models: Evo, RNA structure, splicing
Variant Effect Prediction: AlphaMissense and clinical interpretation boundaries

Part III: Cells, Tissues, and Systems Biology

Single-Cell Foundation Models: scGPT, Geneformer, and what foundation means at single-cell resolution
Spatial Omics and Tissue Models: Tissue context and spatial representation learning
Cell Painting and Image-Based Phenotyping: Bray-2016 protocol and current AI methods
Histopathology AI: Whole-slide models and tissue biomarker research
Microscopy and Cryo-EM AI: Segmentation, denoising, particle picking, and reconstruction
Perturbation Prediction and Virtual Cells: GEARS, CZI Virtual Cells Platform, and the gap between annotation and prediction
Microbiome and Multi-Omics AI
Systems Biology and Multiscale Modeling: Gene regulatory networks, pathway models, and multiscale simulation

Part IV: Organismal and Environmental Biology

Neuroscience AI and Brain Foundation Models: Neural recordings, connectomics, imaging, and brain foundation models
Aging and Longevity Biology AI: Aging clocks, senescence biology, and geroscience model boundaries
Plant, Crop, and Agricultural AI: Plant genomics, crop phenotyping, and breeding decisions
Environmental and Ecological AI: Biodiversity, environmental DNA, ecological monitoring, and conservation biology
Virtual Organisms and Digital Biology: Organism-scale simulation and phenotype prediction

Part V: Therapeutic Discovery and Translation

Target Identification and Prioritization
Small Molecule Generation and ADMET
Chemical Biology and Target Engagement: Probes, mechanism-of-action inference, target engagement, degraders, and molecular glues
Drug Repurposing and Combination Therapy
mRNA, RNA, and Vaccine Design
Cell and Gene Therapy AI: Engineered cells, vectors, genome editing, potency assays, and manufacturing analytics
Diagnostics and Biomarker Translation: Assay validity, companion diagnostics, biomarker qualification, and context of use
Clinical Trial AI for Translational Research
Real-World Evidence and Biomarker AI
Translational Evidence and Failure Modes

Part VI: Research Systems, Practice, and Governance

Self-Driving Laboratories: Closed-loop experimentation
Robotic Lab Automation and Cloud Labs
Synthetic Biology Design Tools
AI for Biomanufacturing
Agentic Science Workflows: Coscientist, Virtual Lab, and what an agent actually owns
Toolkit for AI-Augmented Bio Research
Benchmarks for Bio AI
Reproducibility and Open Science
Information Hazards in Capability Research: Dual-use review
Workforce, Compute, and Institutional Readiness
Emerging Frontiers in AI for the Life Sciences

The Infrastructure Layer

Models are visible; infrastructure is decisive. The capability gaps in life sciences AI are often data gaps, benchmark gaps, or compute gaps before they are architecture gaps.

Public Programs

NIH Bridge2AI (NIH Common Fund, Bridge2AI Consortium): Four grand-challenge data generation projects (CHORUS for AI/ML in clinical care, CM4AI for functional genomics, VOICE for precision public health, AI-READI for salutogenesis). The program’s premise: AI-ready datasets are themselves an infrastructure problem, requiring metadata, ethics review, quality control, and workforce development: not only more storage.
ARPA-H IGoR (ARPA-H programs page; HHS press release, May 2026): Intelligent Generator of Research, focused on Alzheimer’s, Parkinson’s, and autoimmune disease. ARPA-H also funds adjacent AI programs: ADVOCATE (cardiovascular AI agents), RAPID (rare-disease AI diagnostics), CATALYST (ADME-tox modeling), ADAPT (precision cancer therapy).
NCI Cancer Research Data Commons (NCI CRDC): Data infrastructure spanning genomics, proteomics, and imaging that AI work depends on, even when not formally an “AI program.”

Non-Profit and Foundation Programs

CZ Biohub and CZ CELLxGENE (CZ CELLxGENE Discover): Roughly 100 million curated single-cell observations in a standardized, queryable platform. The Tabula Sapiens collection (1.1M cells from 28 organs, 24 donors) is a benchmark first-draft human cell atlas.
CZI Virtual Cells Platform (CZI): An active program to build and benchmark foundation models for cell biology.
Arc Institute: Co-developer (with Stanford, UC Berkeley, UCSF, and NVIDIA) of the Evo and Evo 2 genomic foundation models (Nguyen et al., 2024; Brixi et al., 2026).

Frontier Labs

The major AI labs each have life-sciences programs at varying degrees of openness:

Lab	Visible Life-Sciences Work	Evidence Type
Google DeepMind / Isomorphic Labs	AlphaFold 2/3, AlphaMissense, AlphaProteo; Eli Lilly and Novartis drug-discovery partnerships (Isomorphic Labs, January 2024)	Peer-reviewed for AlphaFold lineage; AlphaProteo is arXiv preprint (Zambaldi et al., 2024, preprint); partnerships are factual but not efficacy evidence
Anthropic	Claude for Life Sciences (October 2025), Claude Science beta (June 2026), AI for Science Program	Company announcements and official product materials; no peer-reviewed life-sciences paper as of this writing
OpenAI	Color Health cancer-screening copilot (OpenAI, June 2024); Moderna ChatGPT Enterprise deployment (OpenAI, April 2024); o1 model GPQA Diamond performance (OpenAI, September 2024)	Verified partnerships and benchmark results; no peer-reviewed biology paper
Meta FAIR / EvolutionaryScale	ESM-2 protein language model (Lin et al., 2023); ESM-3 multimodal (Hayes et al., 2025)	ESM-2 and ESM-3 peer-reviewed

Drug Discovery Companies Using AI

Recursion Pharmaceuticals (Recursion mission): High-content imaging plus ML. 2025 reported first AI-enabled clinical proof of concept; clinical candidates include REC-617 (CDK7) and REC-4881. Note: pipeline contraction also disclosed in May 2025.
Insilico Medicine: Generative chemistry for IPF target TNIK; ISM001-055 reported positive Phase IIa topline (Insilico Medicine, November 2024). Company-reported efficacy; not yet peer-reviewed in a journal.
Insitro: Machine-learning models for metabolic disease and neuroscience; expanded Eli Lilly small-molecule collaboration in September 2025.

Read these with the framework: a partnership announcement is factual evidence of the partnership; it is not evidence that the AI-discovered molecule will read out positively, advance to Phase III, or change a patient’s outcome.

Who This Handbook Is For

The handbook is written for several overlapping audiences. The shared question is: when does an AI output deserve experimental attention?

Role	What You Need From This Handbook
Computational biologist	Capability tier for each model class; what the failure modes are; how to design a benchmark that reflects your actual question
Biotechnology team lead	Build-vs-buy framing; license diligence; what the open-source alternatives are when a frontier release is restricted
Drug discovery scientist	Where AI shifts a stage gate vs. where it does not; how to read a vendor pitch against published evidence
Physician-scientist	Translation between bench AI and clinical decision-making; what makes a discovery-stage AI claim relevant to the clinic
Synthetic biologist	Design tools, autonomous lab integration, dual-use considerations
Graduate student	Conceptual entry points into model classes; canonical citations; how to read benchmark results
Research program leader	Capital allocation framing; which capabilities are infrastructure-grade vs. research-grade; how to evaluate proposals that invoke AI

How to Read the Rest of the Handbook

Quick Orientation

Executive Summary: Handbook-wide conclusions
Protein Structure Prediction: The best-established capability and its limits
Evaluation Principles for Life Sciences AI: The framework that turns capability into decision

Deeper Orientation

Add:

Single-Cell Foundation Models: The capability frontier in cell biology
Self-Driving Laboratories: The autonomous laboratory frontier
Information Hazards in Capability Research: Dual-use considerations for design and generation tools

If you are doing a deep program review

Read the relevant Part end-to-end. Each chapter is self-contained but cross-references the others.

Common Questions

Is life sciences AI the same as “AI for science”?

Overlapping but not identical. “AI for science” is a broader phrase that includes physics, chemistry, materials, climate, mathematics, and other domains. This handbook focuses on biological systems: proteins, cells, genomes, tissues, organisms, environments, and the research workflows that connect them.

Do I need to know machine learning to use this handbook?

No. The handbook is written for working biologists, drug discovery scientists, and program leaders. It treats ML as a research instrument the same way it would treat a sequencer or a microscope: you should understand what the output means and what its limits are; you do not need to build the instrument from scratch. The Foundation Models for Biology and Evaluation Principles chapters provide enough ML context to read the rest of the handbook.

How often does the handbook update?

Continuously, as major papers and benchmark results appear in Nature, Science, Cell, and adjacent venues, and as frontier-lab and FDA actions warrant. The publication and modification dates are visible in each chapter’s metadata.

Why is dual-use treated as a separate chapter rather than woven through?

Because the questions in dual-use review (what to publish, what to release, how to communicate capability without enabling misuse) apply to many model classes at once. The dedicated chapter (Information Hazards in Capability Research) keeps the framework in one place. Cross-references in individual chapters point back to it.

What’s the difference between a foundation model and a task-specific model in biology?

A foundation model is pretrained on broad data (sequences, structures, cells) and adapted to many downstream tasks. A task-specific model is trained for one task. In life sciences, the foundation-model wave is real: ESM-2 / ESMFold for proteins, scGPT and Geneformer for single cells, Evo and Evo 2 for genomes. The wave is also recent and the generalization claims are still being validated. Foundation Models for Biology treats this in depth.

Are AI-discovered drugs real yet?

In the sense that some AI-discovered molecules have reached and progressed in clinical trials, yes: Insilico’s ISM001-055 (Phase IIa, company-reported) is a frequently cited example. In the sense that an AI-discovered molecule has been approved and changed standard-of-care, not yet. Read the evidence with the framework: a Phase IIa topline is a meaningful signal, not a registration-grade outcome. Translational Evidence and Failure Modes covers this in depth.

What is demonstrated?

Demonstrated life sciences AI is task-specific. AlphaFold 2 supports single-chain protein structure prediction for many well-folded domains; RFdiffusion supports experimentally tested protein design; AlphaMissense supports missense-variant triage as a research tool; scGPT, Geneformer, Evo, Evo 2, GEARS, Coscientist, and Virtual Lab support selected representation, prediction, generation, and laboratory-workflow tasks. Each result is demonstrated only within its evaluated biology, data distribution, and validation design.

What is theoretical?

Theoretical claims include routine transfer across unseen tissues, species, protein families, cell states, and experimental platforms; cross-modality systems that reason cleanly from sequence to cell to organism; and AI agents that plan biology programs with limited human supervision. Current methods make those claims plausible in bounded settings, but the evidence still depends on prospective validation, independent replication, and the specific decision being made.

What is beyond current capability?

Beyond current capability includes general biological reasoning that removes the need for experiments, fully autonomous drug or biology discovery without human accountability, and model outputs treated as biological facts. A prediction may be useful, but measurement remains the ground truth. Any claim that skips the falsifying experiment is outside the evidence standard used in this handbook.

What would make this more promising?

The area becomes more promising with repeated prospective studies showing that AI-selected molecules, perturbations, targets, or experiments improve decisions across independent laboratories and biological contexts. Stronger evidence would pair benchmark performance with experimental denominators: what was generated, selected, tested, failed, and advanced. The decisive evidence is not a larger model announcement; it is a model-linked experiment that changes a research or program decision and survives external reproduction.

What should researchers, biotech teams, funders, and program leaders do with this?

Name the biological object first. Sequence, structure, ligand, cell state, tissue, experiment, or clinical endpoint. The right model class depends on the object.
Name the validation object second. What experiment, benchmark, or independent dataset would change your decision if the model were wrong?
Do not equate a model score with biological truth. A high pLDDT, a low Tanimoto, a strong attention weight: these are model outputs, not measurements.
Treat every vendor claim as a claim about a specific data distribution until proven otherwise. A model that works on one cell line, one species, or one assay does not work on all of them.
Read the license before scoping the project. A model you cannot run on your infrastructure is not usable for your program.
Cite by version and venue. “AlphaFold” without a version is ambiguous; an announcement is not a paper; a partnership is not an outcome.

Cross-References

Executive Summary: Handbook-wide conclusions
Biological Data Infrastructure: The data layer underneath the models
Foundation Models for Biology: Architectural lineage of current systems
Evaluation Principles for Life Sciences AI: The framework that turns capability into decision
Protein Structure Prediction: The best-established demonstrated capability
Information Hazards in Capability Research: Dual-use review
Companion handbooks in this series: see “Companion Handbooks” on the welcome page