The Life Sciences AI Handbook: AI for Biomedical Discovery, Biotechnology, and Translational Research

Name: The Life Sciences AI Handbook
Author: Bryan Tegomoh

Tegomoh, Bryan; [Bryan Tegomoh, MD, MPH](https://bryantegomoh.com/)

Nucleic Acid and Genome Models

Author

Bryan Tegomoh, MD, MPH

Published

May 24, 2026

Genome models move life sciences AI from protein sequence toward regulatory sequence, RNA, genome organization, and cellular context. The unit of modeling is no longer only a protein product.

Learning Objectives

Distinguish coding-sequence models from regulatory-sequence models.
Explain the importance of context length and assay tracks.
Read genome-model claims through organism and cell-type coverage.

TL;DR

DNA and RNA models are strongest when the output is tied to measured functional genomic assays. Variant interpretation remains difficult when disease mechanism, cell context, and long-range regulation are uncertain.

Introduction

Evo was presented as a biological foundation model operating from molecular to genome scale (Nguyen et al., 2024). AlphaGenome focuses on regulatory variant-effect prediction from long DNA sequence context (Avsec et al., 2026). These systems differ from protein language models because regulatory function depends on cell type, chromatin context, and measurement modality.

Demonstrated

Demonstrated capability includes sequence-to-function prediction for specific genomic tracks and zero-shot or few-shot transfer for selected molecular tasks. Evo demonstrated modeling across DNA, RNA, and proteins in the Science report indexed by PubMed (Nguyen et al., 2024). AlphaGenome demonstrated regulatory variant-effect prediction using megabase-scale DNA sequence inputs in Nature (Avsec et al., 2026).

Evidence Anchor	What It Supports	Practical Constraint
Evo	Genome-scale sequence modeling across biological modalities	Training domain and organism coverage define use
AlphaGenome	Regulatory variant-effect prediction from long sequence	Human and mouse training context does not equal all biology
Bridge2AI	AI-ready genomic data as an infrastructure need	Ethics and metadata remain part of model validity

Theoretical

Theoretical capability includes genome editing design that forecasts regulatory, transcriptomic, proteomic, and phenotypic outcomes before experiments. The causal path from sequence to phenotype remains too context-rich for routine certainty.

Beyond Current Capabilities

Beyond current capabilities includes whole-organism phenotype prediction from raw genome sequence alone. Development, environment, epigenetics, microbiome, and measurement context prevent that claim.

Practice Notes

Track organism, genome build, cell type, assay, and context window.
Separate coding variant interpretation from noncoding regulatory interpretation.
Use perturbational validation for proposed regulatory edits.
Keep RNA structure, RNA expression, and RNA therapeutic design as related but distinct tasks.