Nucleic Acid and Genome Models
Genome models move life sciences AI from protein sequence toward regulatory sequence, RNA, genome organization, and cellular context. The unit of modeling is no longer only a protein product.
- Distinguish coding-sequence models from regulatory-sequence models.
- Explain the importance of context length and assay tracks.
- Read genome-model claims through organism and cell-type coverage.
DNA and RNA models are strongest when the output is tied to measured functional genomic assays. Variant interpretation remains difficult when disease mechanism, cell context, and long-range regulation are uncertain.
Introduction
Evo was presented as a biological foundation model operating from molecular to genome scale (Nguyen et al., 2024). AlphaGenome focuses on regulatory variant-effect prediction from long DNA sequence context (Avsec et al., 2026). These systems differ from protein language models because regulatory function depends on cell type, chromatin context, and measurement modality.
Demonstrated
Demonstrated capability includes sequence-to-function prediction for specific genomic tracks and zero-shot or few-shot transfer for selected molecular tasks. Evo demonstrated modeling across DNA, RNA, and proteins in the Science report indexed by PubMed (Nguyen et al., 2024). AlphaGenome demonstrated regulatory variant-effect prediction using megabase-scale DNA sequence inputs in Nature (Avsec et al., 2026).
| Evidence Anchor | What It Supports | Practical Constraint |
|---|---|---|
| Evo | Genome-scale sequence modeling across biological modalities | Training domain and organism coverage define use |
| AlphaGenome | Regulatory variant-effect prediction from long sequence | Human and mouse training context does not equal all biology |
| Bridge2AI | AI-ready genomic data as an infrastructure need | Ethics and metadata remain part of model validity |
Theoretical
Theoretical capability includes genome editing design that forecasts regulatory, transcriptomic, proteomic, and phenotypic outcomes before experiments. The causal path from sequence to phenotype remains too context-rich for routine certainty.
Beyond Current Capabilities
Beyond current capabilities includes whole-organism phenotype prediction from raw genome sequence alone. Development, environment, epigenetics, microbiome, and measurement context prevent that claim.
Practice Notes
- Track organism, genome build, cell type, assay, and context window.
- Separate coding variant interpretation from noncoding regulatory interpretation.
- Use perturbational validation for proposed regulatory edits.
- Keep RNA structure, RNA expression, and RNA therapeutic design as related but distinct tasks.