Systems Biology and Multiscale Modeling
Systems biology is the place where representation learning meets mechanism. The question is not only whether a model embeds a cell state well. The question is whether it captures enough regulatory, pathway, spatial, and temporal structure to support an intervention decision. This chapter gives systems biology a permanent home in the handbook so that gene regulatory networks, pathway reasoning, multiscale simulation, and virtual-organism claims are not forced into single-cell or therapeutic chapters where the scope is narrower.
- Distinguish representation models from mechanistic or causal models
- Identify the evidence required for gene regulatory network claims
- Separate pathway enrichment, regulatory inference, and multiscale simulation
- Recognize when a virtual-cell claim becomes a virtual-organism claim
- Evaluate whether a model output has a falsifying perturbation experiment
Introduction
Many AI biology claims fail at the scale boundary. A model may predict a transcript, protein structure, or cell-state embedding, then get used as if it predicted a pathway, tissue response, or organism phenotype. Systems biology is the discipline that prevents that category error. It forces the model claim to name the regulatory network, pathway, dynamical system, feedback loop, measurement time scale, and intervention that would test the claim.
Gene regulatory network inference is the immediate bridge from single-cell and multi-omic data to systems reasoning. Reviews of GRN inference in the single-cell multi-omics era emphasize that transcriptomic and chromatin-accessibility data improve regulatory maps, but benchmarking and experimental assessment remain central (Badia-i-Mompel et al., 2023). Whole-cell computational modeling shows the older mechanistic tradition: Karr and colleagues built a genotype-to-phenotype model of Mycoplasma genitalium, but that success also shows why comprehensive models scale slowly (Karr et al., 2012).
Demonstrated
Demonstrated capability includes GRN inference as a structured hypothesis-generation layer, not as a complete causal map. Single-cell multi-omics improves the evidence available for transcription factor and chromatin-state relationships, and perturbation datasets provide stronger tests than observational co-expression alone.
Demonstrated capability also includes narrow whole-cell models in simple organisms where the scope of molecular components and interactions is unusually constrained. These models are valuable because they expose the information burden: even for minimal organisms, a useful model requires curated mechanisms, parameter estimates, and phenotype validation.
Theoretical
Theoretical capability includes AI-assisted multiscale models that connect sequence, chromatin, RNA, protein, cell state, tissue architecture, and organism phenotype. Current methods have pieces of that stack. They do not yet provide a general path from molecular input to reliable organism-level behavior.
Theoretical capability also includes hybrid systems that combine foundation-model representations with mechanistic simulators. This is a plausible direction because representation models can compress high-dimensional observations, while mechanistic models encode constraints. The hard part is calibration against experiments that perturb the system.
Beyond current capabilities
Beyond current capabilities includes virtual organisms that reliably forecast phenotype from genome, environment, development, microbiome, and intervention history across arbitrary contexts. The measurement, mechanism, and validation requirements are larger than current datasets support.
Beyond current capabilities also includes regulatory networks inferred from observational data alone being treated as causal maps. Without perturbation, time, or orthogonal evidence, most network edges remain hypotheses.
Practice Notes
- Name the scale: regulatory element, gene, pathway, cell, tissue, organ, organism, or ecosystem.
- Ask whether the network edge is observational, perturbational, literature-derived, or mechanistic.
- Treat pathway enrichment as a clue, not a model.
- Require perturbation data before treating a regulatory claim as causal.
- Keep virtual-organism claims separate from virtual-cell claims.