Emerging Frontiers in AI for the Life Sciences

Author
Published

May 24, 2026

The hardest part of writing about the future of AI in the life sciences is separating frontiers where partial evidence is accumulating from frontiers where the claim is currently hype. This chapter does that separation explicitly. Seven frontiers are organised by how much evidence they currently have: demonstrated in partial form, theoretical but plausible, and beyond what current methods can support. The aim is not prediction. The aim is to give a reader in 2026, and a reader in 2030 looking back, a calibrated map of where the field stood and which claims were worth taking seriously.

Learning Objectives

This chapter is the forward-looking counterpart to the History of AI in the Life Sciences. History reads the arc backward; this chapter reads it forward, with the same discipline of grounded claims and named open questions. You will learn to:

  • Distinguish three classes of frontier: partially demonstrated, theoretical but plausible, beyond current capabilities
  • Recognise the seven durable frontiers most likely to define AI in life sciences for the next decade
  • Read AI-in-biology capability claims against this map, separating real progress from marketing
  • Frame what would have to be true for a frontier to materialise, instead of guessing dates
  • Identify which frontiers are AI-bound and which are biology-bound (the latter will not be solved by more compute)
  • Decide which frontiers your team should track, which to engage with, and which to defer
  • Avoid the two opposite failure modes in talking about the future: hype that overstates what is near, and dismissiveness that misses what is already changing

Seven frontiers, tiered by current evidence:

Tier Frontier Current state, in one line
Partially demonstrated Virtual cells (cell-type-specific) Per-cell-type models exist; general-purpose virtual cell does not
Partially demonstrated Autonomous discovery loops Working in materials science and narrow biology; broad clinical biology unproven
Partially demonstrated Multi-modal biology models AlphaFold 3, ESM-3, Boltz-2 combine 2-3 modalities; full integration is research
Theoretical, plausible AI-designed therapeutics through approval Candidates in clinical trials in 2026; no Phase 3 approval on AI-generated molecule yet
Theoretical, plausible Personalised medicine at population scale Variant interpretation triage exists; clinical-quality predictions for most variants do not
Theoretical, contested Global access narrowing or widening Both trajectories are visible in 2026; outcome depends on policy more than capability
Theoretical, contested Workforce transition Routine analysis shifts to AI; experimental judgment remains human; net effect uncertain
Beyond current Single universal biology foundation model Theoretically interesting; not a credible near-term direction
Beyond current Fully autonomous wet-lab science without human judgment Demos exist; reliable production science does not

Three durable framings to apply to any future claim:

  1. Demonstrated in partial form, theoretical but plausible, or beyond current capabilities. Most marketing claims slot into the second or third bucket while implying they are in the first.
  2. AI-bound or biology-bound. Several frontiers will not be solved by larger models; they are limited by the rate at which biological experiments can be run, or by the cost of measuring phenotypes that AI must learn to predict.
  3. What would have to be true. A frontier worth taking seriously has a small number of named conditions that would, if satisfied, make it real. Frontiers without such conditions are usually wishes.

Introduction

The genre of “future of AI in [domain]” is unusually dangerous because it rewards confident prediction over calibrated uncertainty, and confident prediction in this field has a poor recent track record. The autonomous-driving timeline of the 2010s, the early clinical-AI deployments of the 2020s, and the recurring “drug discovery in two years” claims of every venture cycle should make any practitioner cautious about specific date predictions.

The defence is a more disciplined form of the same exercise: distinguish what is happening from what could happen from what should not yet be assumed. The structure of this chapter follows the project’s standard tier convention used in the Evaluation Principles chapter:

  • Demonstrated frontiers have partial working evidence, not full delivery. They are most likely to be the dominant developments of the next five years.
  • Theoretical frontiers are plausible from current evidence but have not yet been shown end to end. They are the most contested zone, where credible disagreement is healthy.
  • Beyond current capabilities frontiers should be treated with deep skepticism. They may eventually materialise, but capability claims in this tier today are claims, not results.

The aim is calibration. A practitioner in 2026 should be able to read this chapter and know which frontiers to track, which to engage with, which to defer, and which to ignore. A practitioner in 2030 should be able to read this chapter and see, item by item, where the calibration was right and where it was wrong.

Demonstrated

Virtual cells (cell-type-specific)

A virtual cell is a computational model that predicts how a cell will respond to perturbations: drugs, genetic edits, signalling inputs, environmental changes. The community roadmap (Bunne et al., 2024) frames the virtual cell not as a single model but as a multi-scale, multi-modal predictive system: gene expression, protein activity, metabolism, signalling, structure, behaviour, all coupled.

What exists in 2026: cell-type-specific perturbation models (Geneformer, scGPT, scFoundation lineage), pathway-specific simulation models, and several single-cell foundation models that capture useful representations across cell types. What does not exist: a general-purpose virtual cell that handles arbitrary perturbations on arbitrary cells with the reliability that AlphaFold 2 brought to structure prediction. The Perturbation Prediction and Virtual Cells chapter treats the current state in detail.

What would have to be true for a general virtual cell to materialise: orders-of-magnitude more perturbation data than currently exists, a foundation-model architecture that integrates the multi-scale data without losing biology, and an evaluation regime as rigorous as CASP was for structure. None of these is impossible; none is close to in place.

End-to-end autonomous discovery loops

A closed-loop discovery system proposes experiments via a model, executes them via robotic systems, analyses results via a model, and retrains. Materials science has working examples; biology has narrower ones (microbiome optimisation, enzyme engineering, certain antibody affinity maturation campaigns). The Self-Driving Laboratories and Agentic Science Workflows chapters track the current evidence.

What exists: autonomous loops in well-defined biological optimisation tasks. What does not exist: autonomous loops that produce clinically relevant biology without ongoing scientist supervision. The bottleneck is partly the integration of robotic execution with biological assays at clinical complexity, and partly the unsolved question of whether agentic systems make the kind of judgment calls that determine whether a piece of biology is interesting.

What would have to be true: reliable, scalable assays for the biology in question; agents whose reasoning is auditable enough that scientists trust their experimental choices; institutional and regulatory frameworks that handle agent-led decisions at the lab level. The first is biology-bound; the second is AI-bound; the third is governance-bound.

Multi-modal biology foundation models

Current foundation models for biology mostly handle one modality at a time: sequence (ESM-2, ESM-3, Evo), structure (AlphaFold lineage), cells (Geneformer and successors), small molecules (various). AlphaFold 3 (Abramson et al., 2024), Boltz-2, and Chai-1 combine sequence, structure, and small-molecule chemistry; ESM-3 combines sequence, structure, and function annotations.

What exists: working two- and three-modality models. What does not exist: a model that handles sequence, structure, cell, tissue, organism, and behaviour in one representation. Whether such a model is even desirable is contested; biology may benefit more from specialised models that interoperate than from one universal model.

What would have to be true: pre-training data that span the modalities with adequate coverage, architectures that handle scale heterogeneity across modalities, and evaluations that test integrated reasoning rather than per-modality performance. Some progress on all three is visible; convergence is not.

Theoretical

AI-designed therapeutics through regulatory approval

AI-discovered, AI-optimised, and AI-designed candidates are in clinical trials in 2026. Several pharmaceutical companies report AI involvement at multiple steps of the discovery pipeline. None has yet completed a Phase 3 pivotal trial and received FDA approval primarily on the strength of an AI-generated molecule.

The bottleneck is not AI capability. The bottleneck is the same translational gap that limits all of drug discovery: efficacy in the right patient population, safety at therapeutic doses, manufacturability at scale, and a commercial model that works. AI improves the early stages (hit discovery, lead optimisation, structure prediction); the later stages (clinical efficacy, manufacturing, regulatory) are governed by biology and economics that no model accelerates by much. The Translational Evidence and Failure Modes chapter treats the gap in detail.

What would have to be true: enough AI-derived candidates moving through trials to produce statistical evidence of improved success rates, regulatory frameworks that accept AI-derived data as primary evidence, and patient stratification approaches that exploit AI-derived precision in selecting who benefits. All three are plausible; none is yet established.

Personalised medicine at population scale

The vision is variant-level interpretation for every person in a population, integrated with phenotype, used to inform care. Partial evidence exists: AlphaMissense (Cheng et al., 2023) classifies approximately 89% of human missense variants as likely benign or likely pathogenic at research-grade quality. Polygenic risk scores are now used in some clinical settings. Pharmacogenomic guidance is increasingly automated.

What does not exist: clinical-quality interpretation for most variants of unknown significance, integration of variant-level prediction with phenotype at the level of routine care, and the equity infrastructure to ensure that personalised medicine is not personalised only for those whose ancestry is over-represented in training data. The third is the under-discussed risk; AI personalisation built on a non-representative reference panel will entrench inequities, not narrow them.

What would have to be true: regulatory and clinical pathways for AI-generated variant interpretations to inform care; functional-validation data at scale to ground the predictions; reference data that represent global genetic diversity. Movement is visible on all three, slowly.

Global access narrowing or widening

In 2026, the question of whether AI in life sciences narrows global health inequities or widens them is genuinely open. The narrowing case: open-weight foundation models, accessible cloud compute, and shared benchmarks let well-resourced LMIC research groups operate near the global frontier on questions of local relevance (neglected diseases, regional pathogen surveillance, climate-driven health). The widening case: compute and data concentration in a small number of institutions creates a dependence on infrastructure most countries cannot replicate, and AI-personalised medicine built on under-representative reference data exports inequities that take decades to correct.

Both trajectories are visible. The outcome depends on policy and funding decisions more than on AI capability: open-data norms, compute access programmes, training pipelines for LMIC AI-in-biology researchers, and the way frontier labs handle data sovereignty. The Wiens et al. roadmap for responsible ML in healthcare (Wiens et al., 2019) frames the equity question for clinical AI and applies, with adjustments, to the broader life-sciences case.

What would have to be true for narrowing to win: sustained policy commitment to open-data and open-weight norms; targeted funding for LMIC AI-in-biology capacity; reference datasets that represent global diversity. None is automatic.

Workforce transition

Routine hypothesis generation, literature search, structural modelling, candidate ranking, and basic data analysis are increasingly AI-assisted. Wet-lab execution, experimental design judgment, and biological interpretation remain human-led. The net effect on the bio research workforce is uncertain. Plausible outcomes range from the most productive ten percent of researchers becoming radically more so (and the field consolidating around them) to a broader uplift in productivity that absorbs more biologists into AI-augmented roles than it displaces.

What would have to be true for the broader uplift outcome: education pathways that make AI tools second nature for the next generation of bench scientists; institutional reward structures that credit hybrid wet-dry-AI work; mid-career retraining that does not require leaving the field. The Workforce, Compute, and Institutional Readiness chapter covers the current state.

Beyond Current Capabilities

A single universal biology foundation model

The aspiration of one foundation model that handles sequence, structure, cell, tissue, organism, and behaviour in one representation is theoretically interesting and not a credible near-term direction. Biology’s scale heterogeneity is more extreme than language’s: nanometre-to-metre length scales, microsecond-to-decade time scales, and qualitatively different physics across scales. Universal-model claims should be evaluated against whether they handle this heterogeneity or whether they handle a narrower problem under a generous label.

Fully autonomous wet-lab science without human judgment

Demos exist; reliable production science does not. The genre of “AI ran an experiment and discovered” headlines mostly describes constrained optimisation in materials or microbiology, not open-ended scientific discovery. A general AI scientist that proposes meaningful biology, runs the right experiments, and interprets the results without scientist supervision is beyond current capabilities. Claims to the contrary should be evaluated against what an independent group reproduced, not against what a single team demonstrated under controlled conditions.

One-decade prediction of which AI will be dominant

The current architectures (transformers in various forms, diffusion models for generation) have been dominant for less than a decade. The history chapter shows that each prior decade’s dominant architecture was largely unanticipated by the prior decade’s practitioners. Claims that the current generation will still be dominant in 2036 should be evaluated against this prior, not on the strength of present momentum.

Practice Notes

Decisions today that hold up across most futures:

  • Build on durable foundations, not on the current generation’s architectures. The functions (structure prediction, perturbation modelling, sequence search, generative design) will outlast the named systems. Invest in the workflow patterns, not the brand-name tool.
  • Choose frontiers to track explicitly. A laboratory cannot engage with every frontier. Pick two or three that match your scientific question and the next-five-year roadmap, and track them with the same discipline you apply to your own field.
  • Engage at the partially-demonstrated frontier, not at the theoretical or beyond-current frontier. Working with virtual-cell methods for your cell type, autonomous loops for your assay class, or multi-modal models for your modality is reasonable. Building a programme around fully autonomous science or universal foundation models is not.
  • Insist on the same evidence discipline for forward-looking claims as for retrospective ones. The Evaluation Principles chapter applies to “we are about to do X” as much as to “we have done X.” Blind benchmarks, biology-aware splits, prospective validation, and DOME-compliant reporting are the test.
  • Read the equity question explicitly. Personalised medicine, global access, and workforce questions are not subsidiary to the technical frontiers; they are the questions that determine whether the technical frontiers produce broad benefit or concentrate it.
  • Re-read this chapter in 2030. A frontiers chapter is most useful as a measuring stick. Note which frontiers materialised, which stalled, and which collapsed into hype. The exercise calibrates judgment about the next round.