Variant Effect Prediction
Variant effect prediction asks whether a sequence change alters molecular function, cellular state, or disease risk. The model output is not a diagnosis unless the clinical and biological context also supports that interpretation.
- Separate molecular effect, cellular effect, and clinical significance.
- Compare protein and regulatory variant prediction tasks.
- Identify when experimental evidence is required before interpretation.
Variant models help prioritize variants and hypotheses. They do not replace segregation evidence, functional assays, population frequency, disease mechanism, and clinical interpretation.
Introduction
Protein language models, genome models, and functional genomics models now support different variant-effect tasks. AlphaGenome targets regulatory variant effects across genomic signals (Avsec et al., 2026). ESM-family protein models support protein sequence representations relevant to missense variant analysis (Lin et al., 2023).
Demonstrated
Demonstrated capability includes ranking missense and regulatory variants for selected molecular readouts. AlphaGenome demonstrated improved regulatory variant-effect prediction across evaluated tasks in the Nature report (Avsec et al., 2026). Geneformer demonstrated that pretrained gene representations could support gene network predictions in selected settings (Theodoris et al., 2023).
| Evidence Anchor | What It Supports | Practical Constraint |
|---|---|---|
| AlphaGenome | Regulatory sequence-to-function and variant scoring | Cell type and assay coverage constrain interpretation |
| Protein language models | Protein sequence representation for missense effects | Clinical classification needs additional evidence |
| Functional genomics resources | Measured assay tracks for variant interpretation | Assay signal is not the same as disease causality |
Theoretical
Theoretical capability includes joint models that connect variant, molecular effect, cellular response, tissue pathology, and patient phenotype. These are plausible research targets when linked to high-quality perturbational and clinical data.
Beyond Current Capabilities
Beyond current capabilities includes clinical-grade interpretation for arbitrary variants without population, family, functional, and phenotype data. Variant interpretation remains evidence integration, not model output transcription.
Practice Notes
- Label the endpoint: molecular activity, expression, splicing, binding, cell state, or clinical classification.
- Avoid treating one model score as decisive evidence.
- Check ancestry, ascertainment, and population frequency limitations.
- Use calibrated thresholds only when validated for the intended disease and assay context.