The Life Sciences AI Handbook: Steering Frontier Models in Biology

Tegomoh, Bryan

Real-World Evidence and Biomarker AI

Published

July 7, 2026

Real-world evidence and biomarker AI sit where clinical data, molecular data, and therapeutic development meet. The opportunity is to learn from data generated outside tightly controlled trials and to discover biological signals that improve development decisions. The risk is that messy data, confounding, missingness, and weak context-of-use definitions can make an association look like evidence.

Learning Objectives

Use this chapter to:

Use EHRs, claims, registries, wearables, genomics, and biomarker data to support translational learning outside traditional trial datasets.
The target-trial question, cohort construction, missingness, confounding, endpoint definition, and context of use decide whether evidence is credible.

Chapter Summary (TL;DR)

Summary: Use EHRs, claims, registries, wearables, genomics, and biomarker data to support translational learning outside traditional trial datasets. AI helps curate records, extract endpoints, and integrate biomarkers, but causal treatment effects from observational data require careful design.

Key point: The target-trial question, cohort construction, missingness, confounding, endpoint definition, and context of use decide whether evidence is credible. Open question: whether AI-derived evidence remains credible after causal design, missingness, measurement, and confounding are handled.

Bottom line: RWE connects clinical trials, diagnostics, biomarkers, therapeutics, population evidence, and post-market learning.

Field Guide

What is this field trying to solve? Use EHRs, claims, registries, wearables, genomics, and biomarker data to support translational learning outside traditional trial datasets.

What is the core idea? The target-trial question, cohort construction, missingness, confounding, endpoint definition, and context of use decide whether evidence is credible.

What is the current state of the field? AI helps curate records, extract endpoints, and integrate biomarkers, but causal treatment effects from observational data require careful design.

What do we know, and what remains open? Known reference points include FDA RWE guidance, target-trial emulation, registries, EHR datasets, claims databases, wearable streams, biomarker qualification resources, and synthetic-control methods. What remains open is whether AI-derived evidence remains credible after causal design, missingness, measurement, and confounding are handled.

Why does this matter? RWE connects clinical trials, diagnostics, biomarkers, therapeutics, population evidence, and post-market learning.

Introduction

Randomized trials remain the central evidence source for therapeutic efficacy, but they do not answer every development question. Real-world data can inform natural history, feasibility, external controls, safety surveillance, treatment patterns, comparative effectiveness hypotheses, and post-approval evidence. Biomarkers can help identify risk, mechanism, treatment response, safety, or a defined surrogate endpoint.

FDA’s real-world evidence program and framework are explicit that context matters (FDA, 2026; FDA, 2018). The same dataset may support descriptive epidemiology, fail as causal evidence, and still be useful for trial design. The first question is not which model to use. It is what decision the evidence is meant to support.

FDA authors framed RWE as information that can inform product safety and effectiveness when the data source and analytic design are fit for purpose (Sherman et al., 2016; Corrigan-Curay et al., 2018). The methodological discipline is causal, not cosmetic. Target-trial emulation requires specifying eligibility, treatment strategies, time zero, follow-up, outcome, causal contrast, and analysis before fitting a model (Hernán and Robins, 2016).

The target-trial protocol is the first artifact

A serious RWE analysis starts by writing the randomized trial that would have been run if randomization were feasible. The protocol names eligibility criteria, treatment strategies, assignment time, follow-up, outcome, causal contrast, and analysis plan before any model is fit. Hernán’s JAMA guide makes this operational because it turns observational analysis into a trial design problem rather than a feature-selection problem (Hernán et al., 2022).

This discipline is especially important for AI because the model can make confounding look precise. A high-performing endpoint extractor or risk score does not fix immortal time, treatment-selection bias, missing severity, incompatible comparators, or endpoint misclassification. Hubbard and colleagues emphasized the same caution in the New England Journal of Medicine: target-trial emulation can be valuable when designed carefully, but the term should not be used to launder weak observational comparisons (Hubbard et al., 2024).

External controls require bias maps

External or synthetic controls are attractive when randomized controls are difficult, unethical, or too slow. Their credibility depends on whether the external cohort matches the trial population, time zero, disease severity, outcome measurement, follow-up, and standard-of-care background. The bias map should name what could differ and which sensitivity analyses address it.

AI can support matching, abstraction, endpoint extraction, and missingness modeling, but it should not hide the causal question. High-throughput target-trial emulation for Alzheimer’s disease repurposing illustrates how real-world data can be screened systematically while still requiring careful confounding control and interpretation (Zang et al., 2023). For regulatory evidence, the analytic design and data provenance remain as important as the model.

Biomarker AI needs a qualification ladder

A biomarker model should state its intended category before reporting performance. Prognostic biomarkers stratify baseline risk; predictive biomarkers identify differential treatment effect; pharmacodynamic biomarkers measure pathway engagement; safety biomarkers flag harm; surrogate endpoints require evidence that treatment effects on the biomarker predict treatment effects on clinical outcomes. A model that predicts outcome is not automatically predictive of treatment benefit.

The qualification ladder therefore has four steps: analytic validity (the measurement is reliable), clinical validity (the measurement relates to the clinical state or outcome), clinical utility (acting on it improves a decision), and context-of-use acceptance. AI can help discover candidate biomarkers, but qualification still belongs to this ladder.

What is demonstrated?

Real-World Data Curation

RWE work begins with data curation. EHR-derived data, claims, registries, pharmacy records, imaging, pathology, genomics, and device data all require provenance, harmonisation, missingness checks, and audit trails. AI methods can support abstraction, coding, endpoint extraction, de-identification, and entity resolution, but the evidence value comes from the curated dataset and analysis design.

The FDA RWE page and framework define the regulatory boundary: RWE may support new indications for approved drugs and post-approval study requirements in selected contexts, but suitability depends on data relevance, reliability, study design, and analysis plan (FDA, 2026; FDA, 2018).

Observational analyses can substitute for randomized evidence only under constrained conditions, with careful attention to confounding, treatment timing, comparator selection, outcome measurement, and sensitivity analysis (Franklin and Schneeweiss, 2017). AI can improve curation, but it cannot repair a design that starts with immortal time, unmeasured severity, or incompatible endpoints.

Biomarker Discovery and Qualification

Biomarker AI spans discovery, measurement, validation, and qualification. A model may identify molecular, imaging, digital, pathology, or clinical features associated with outcome. The professional distinction is category and context: prognostic, predictive, pharmacodynamic, safety, surrogate, or companion diagnostic.

The foundational biomarker taxonomy distinguishes biomarkers from clinical endpoints and surrogate endpoints (Biomarkers Definitions Working Group, 2001). Surrogate endpoint claims require evidence that treatment effects on the biomarker reliably predict treatment effects on a clinically meaningful outcome in the proposed context (Fleming and Powers, 2012). An AI-derived feature can be a useful biomarker and still fail as a surrogate endpoint.

FDA’s Biomarker Qualification Program ties biomarker use to a context of use (FDA, 2026; FDA, 2026). Qualification does not mean a biomarker is universally valid. It means the biomarker is accepted for the defined purpose and conditions.

Context of use is the evidence anchor. Without it, a biomarker claim is only an association.

Multi-Omic Biomarker Integration

Multi-omic biomarker discovery combines genomics, transcriptomics, proteomics, metabolomics, pathology, imaging, and clinical data. The value is strongest when each modality answers a different biological question and the validation endpoint is specified. The risk is that the model learns batch, site, assay version, or cohort structure.

Strong multi-omic biomarker work includes sample-level provenance, assay versioning, missingness maps, population stratification, train-test separation by site or time, and orthogonal validation. If a biomarker is intended to predict treatment effect, prognostic performance is not enough.

Platform Category

Tempus, Flatiron Health, Komodo Health, and Datavant/Aetion illustrate the platform category for clinical, molecular, and real-world data in life sciences. Tempus’s 2024 S-1 filing is useful industrial context for the clinical and molecular data platform thesis (Tempus AI, 2024).

Company sources establish category and positioning. They do not establish that a specific RWE or biomarker claim is decision-grade. For diligence, request data dictionaries, cohort definitions, endpoint definitions, missingness analysis, audit trails, and external validation.

Beyond Trial Operations

Clinical trial AI covers operational use cases such as recruitment, site selection, monitoring, and endpoint extraction. RWE and biomarker AI is a distinct evidence category because it asks whether non-trial data or model-derived biomarkers can support inference. The overlap is real, but the evidentiary burden differs.

The clean separation is this: trial operations improve execution; RWE and biomarker work may affect evidence. Evidence-affecting use carries a higher standard.

What is theoretical?

AI-Derived Predictive Biomarkers

AI-derived predictive biomarkers are plausible when the model identifies patients with differential treatment benefit. The challenge is causal: prognostic markers are easier to find than predictive markers. Demonstrating treatment-effect modification requires appropriate trial or quasi-experimental design, not only observational association.

External Controls and Synthetic Arms

RWE can support external-control or synthetic-control work in selected settings, especially rare disease or oncology contexts with strong natural-history data and ethical constraints on randomization. Theoretical value is high, but bias control is difficult. Eligibility alignment, calendar time, outcome definition, missingness, treatment changes, and unmeasured confounding decide credibility.

Learning Health Data Loops

The long-term use case is a learning loop where real-world data informs biomarker discovery, trial design, post-market safety, and label refinement. The practical barrier is governance: consent, privacy, data contracts, quality standards, model monitoring, and institutional accountability.

What is beyond current capability?

Causal Treatment Effects from Observational Models Alone

No observational model establishes treatment effect without design assumptions. Confounding by indication, immortal-time bias, measurement bias, missingness, and treatment selection remain central threats.

Universal Biomarker Transfer

A biomarker trained in one cohort, assay, ancestry mix, disease stage, or care setting should not be assumed to transfer. Transportability needs explicit evaluation.

Replacing Randomization by Default

RWE may support selected regulatory questions, but it does not replace randomization by default. The evidentiary standard depends on indication, intervention, endpoint, available data, and regulator engagement.

What would make this more promising?

RWE and biomarker AI become more promising when the dataset, causal question, measurement, and decision are tied together before modeling.

Claim	Evidence that raises or lowers confidence
“The real-world data are fit for purpose”	Provenance, completeness, linkage quality, coding logic, missingness, time windows, and audit trail support the intended decision
“The analysis estimates treatment effect”	Target-trial protocol, time-zero alignment, comparator definition, confounding control, and sensitivity analyses are prespecified
“The biomarker is useful”	Biomarker category, assay, specimen, threshold, validation cohort, and clinical action are defined
“The external control is credible”	Eligibility, calendar time, disease severity, endpoint, follow-up, and standard-of-care background match the trial question
“The evidence can support a regulatory decision”	Context of use, study design, data reliability, analysis plan, and regulator engagement are documented

The most important change is moving from association to a stated decision: who is included, what is measured, what action follows, and what uncertainty is acceptable.

What should researchers, biotech teams, funders, and program leaders do with this?

Write the context of use before reviewing performance. Specify decision, population, endpoint, data source, time window, and acceptable uncertainty.

Separate descriptive, predictive, and causal claims. Do not treat a risk model, association model, and treatment-effect claim as interchangeable.

Specify the target trial when estimating treatment effects from real-world data. If the trial cannot be stated, the causal claim is not ready.

Audit data provenance. Record source systems, extraction logic, coding systems, missingness, assay versions, linkage quality, and de-identification steps.

Validate biomarkers by category. Prognostic, predictive, pharmacodynamic, safety, surrogate, and companion-diagnostic claims need different evidence.

Treat platform claims as diligence leads. Ask for cohort construction, data dictionaries, endpoint adjudication, validation reports, and governance controls.