Variant Effect Prediction

Author
Published

May 24, 2026

Variant effect prediction asks whether a sequence change alters molecular function, cellular state, or disease risk. The model output is not a diagnosis unless the clinical and biological context also supports that interpretation.

Learning Objectives
  • Separate molecular effect, cellular effect, and clinical significance.
  • Compare protein and regulatory variant prediction tasks.
  • Identify when experimental evidence is required before interpretation.
TL;DR

Variant models help prioritize variants and hypotheses. They do not replace segregation evidence, functional assays, population frequency, disease mechanism, and clinical interpretation.

Introduction

Protein language models, genome models, and functional genomics models now support different variant-effect tasks. AlphaGenome targets regulatory variant effects across genomic signals (Avsec et al., 2026). ESM-family protein models support protein sequence representations relevant to missense variant analysis (Lin et al., 2023).

Demonstrated

Demonstrated capability includes ranking missense and regulatory variants for selected molecular readouts. AlphaGenome demonstrated improved regulatory variant-effect prediction across evaluated tasks in the Nature report (Avsec et al., 2026). Geneformer demonstrated that pretrained gene representations could support gene network predictions in selected settings (Theodoris et al., 2023).

Evidence Anchor What It Supports Practical Constraint
AlphaGenome Regulatory sequence-to-function and variant scoring Cell type and assay coverage constrain interpretation
Protein language models Protein sequence representation for missense effects Clinical classification needs additional evidence
Functional genomics resources Measured assay tracks for variant interpretation Assay signal is not the same as disease causality

Theoretical

Theoretical capability includes joint models that connect variant, molecular effect, cellular response, tissue pathology, and patient phenotype. These are plausible research targets when linked to high-quality perturbational and clinical data.

Beyond Current Capabilities

Beyond current capabilities includes clinical-grade interpretation for arbitrary variants without population, family, functional, and phenotype data. Variant interpretation remains evidence integration, not model output transcription.

Practice Notes

  • Label the endpoint: molecular activity, expression, splicing, binding, cell state, or clinical classification.
  • Avoid treating one model score as decisive evidence.
  • Check ancestry, ascertainment, and population frequency limitations.
  • Use calibrated thresholds only when validated for the intended disease and assay context.