Perturbation Prediction and Virtual Cells
Perturbation prediction is the most direct route from representation learning to biological action. The model must answer what changes after an intervention, not only what a cell resembles.
- Distinguish observational cell embeddings from perturbational models.
- Identify realistic endpoints for virtual cell claims.
- Use causal language only when the design supports it.
Virtual cell work is promising when framed as perturbation prediction for defined outputs. It becomes misleading when a transcriptomic forecast is treated as a full model of the cell.
Introduction
GEARS uses graph-enhanced modeling to predict transcriptional outcomes of novel multigene perturbations in selected settings (Roohani et al., 2024). The Arc Institute Virtual Cell Challenge illustrates the field’s move toward public benchmarks for perturbation response prediction (Arc Institute, 2025).
Demonstrated
Demonstrated capability includes transcriptional perturbation prediction for selected genes, cell contexts, and benchmark designs. GEARS demonstrated prediction of some novel multigene perturbation outcomes (Roohani et al., 2024). Public challenge efforts demonstrate growing demand for independent evaluation of virtual cell models (Arc Institute, 2025).
| Evidence Anchor | What It Supports | Practical Constraint |
|---|---|---|
| GEARS | Multigene perturbation outcome prediction | Generalization depends on graph priors and cell context |
| Virtual Cell Challenge | Public benchmark framing for cell response prediction | A challenge metric is not a full cell model |
| scGPT | Single-cell foundation model baseline | Pretraining alone does not establish causal validity |
Theoretical
Theoretical capability includes models that rank interventions before CRISPR, drug, or combination screens. This is plausible for defined cell systems and measured outputs, especially when active learning adds new experiments.
Beyond Current Capabilities
Beyond current capabilities includes a complete executable cell model that forecasts all molecular, phenotypic, and temporal consequences of arbitrary interventions. Current data cover slices of cellular behavior.
Practice Notes
- Name the perturbation type: knockout, knockdown, activation, compound, dose, timing, or combination.
- Name the output: expression, morphology, viability, secretion, or functional assay.
- Use held-out perturbations and held-out cell contexts separately.
- Avoid causal claims from observational pretraining alone.