The Life Sciences AI Handbook: Steering Frontier Models in Biology

Tegomoh, Bryan

Protein Structure Prediction

Published

July 7, 2026

Protein structure prediction crossed a research threshold at CASP14, when AlphaFold 2 produced single-chain predictions whose median backbone accuracy was comparable to experimental structures (Jumper et al., 2021). Three years later, the AlphaFold Protein Structure Database covers over 214 million predicted structures (Varadi et al., 2024): effectively the predicted structural proteome of life. The hard questions are no longer whether a model can fold a protein; they are which molecular state the prediction represents, what ligand and partner geometry it captures, and what an experiment still has to prove.

Learning Objectives

Use this chapter to:

Infer useful three-dimensional protein and biomolecular structures when experimental structures are unavailable, incomplete, or too slow for the research question.
Confidence scores, conformational state, disorder, ligands, nucleic acids, complexes, and assay context decide whether a predicted structure is usable.

Prerequisites: AI for the Life Sciences recommended, Foundation Models for Biology helpful for the ESMFold and ESM-3 sections.

Chapter Summary (TL;DR)

Summary: Infer useful three-dimensional protein and biomolecular structures when experimental structures are unavailable, incomplete, or too slow for the research question. Single-chain structure prediction is routine for many proteins; complexes, ligand interactions, dynamics, and restricted releases remain more complicated.

Key point: Confidence scores, conformational state, disorder, ligands, nucleic acids, complexes, and assay context decide whether a predicted structure is usable. Open question: how well predictions handle dynamics, ligands, complexes, and decisions that depend on more than a static fold.

Bottom line: Structure prediction feeds protein design, antibodies, variant interpretation, vaccine antigen work, chemical biology, and small-molecule discovery.

Field Guide

What is this field trying to solve? Infer useful three-dimensional protein and biomolecular structures when experimental structures are unavailable, incomplete, or too slow for the research question.

What is the core idea? Confidence scores, conformational state, disorder, ligands, nucleic acids, complexes, and assay context decide whether a predicted structure is usable.

What is the current state of the field? Single-chain structure prediction is routine for many proteins; complexes, ligand interactions, dynamics, and restricted releases remain more complicated.

What do we know, and what remains open? Known reference points include AlphaFold 2, AlphaFold 3, RoseTTAFold, ESMFold, Boltz, Chai, OpenFold, PDB, AlphaFold DB, CASP, CAMEO, and complex-prediction benchmarks. What remains open is how well predictions handle dynamics, ligands, complexes, and decisions that depend on more than a static fold.

Why does this matter? Structure prediction feeds protein design, antibodies, variant interpretation, vaccine antigen work, chemical biology, and small-molecule discovery.

Introduction: From Grand Challenge to Routine Input

December 2020, CASP14 (online): John Moult announces that AlphaFold 2 has achieved a median backbone accuracy of ~1 Å on the hardest free-modeling targets in the biennial Critical Assessment of protein Structure Prediction. The previous best on the same scale was several times worse. By the operational definition Moult and others used, the protein-folding grand challenge was solved for most cases (Jumper et al., 2021).

July 2022: DeepMind and EMBL-EBI release the AlphaFold Protein Structure Database with predictions for approximately every cataloged UniProt protein. By the 2024 database update, coverage exceeds 214 million sequences (Varadi et al., 2024).

May 2024: AlphaFold 3 extends prediction to nucleic acids, ions, small-molecule ligands, and modified residues using a diffusion-based generative architecture (Abramson et al., 2024). Initial release is a web server with limits and without training code.

October 2024: The Royal Swedish Academy of Sciences awards the 2024 Nobel Prize in Chemistry jointly to Demis Hassabis and John Jumper (AlphaFold) and David Baker (computational protein design and Rosetta).

November 2024: MIT’s Barzilay and Jaakkola groups release Boltz-1, an open-source AlphaFold 3-class biomolecular interaction model (Wohlwend et al., 2024, preprint). The earlier open Chai-1 (Chai Discovery et al., 2024, preprint) had already established the community pattern.

These are not just modeling milestones. They are a change in the default operating assumption of structural biology: for many proteins, a usable structural model is available before any experiment is run.

The first proteome-scale use case made the practical boundary clear. AlphaFold was applied to 98.5% of the human proteome, providing high-confidence models for many well-folded domains while also marking extensive low-confidence regions that often correspond to disorder or unresolved structural context (Tunyasuvunakool et al., 2021). The value is not only the model coordinates; it is the confidence map that tells the researcher what not to trust.

The chapter that follows is organized around four questions a researcher actually faces:

Which system should I use, and why?
What does the confidence metric actually mean for my decision?
What does the prediction not tell me?
What experiment do I still owe before I commit a program decision?

The Architecture Lineage

Single-Chain Prediction (AlphaFold 2 and Successors)

AlphaFold 2 (Jumper et al., 2021) combined three ideas:

Multiple sequence alignment (MSA) input. Evolutionary covariation across homologs encodes structural constraints. AlphaFold 2 reads the MSA as a feature, not as input text.
Triangle-attention. Geometric self-consistency over pairwise residue features, iterated 48 times.
End-to-end differentiable structure module. A direct geometric prediction of backbone frames and side-chain rotamers, trained with structural and auxiliary losses.

The system was open-sourced shortly after publication. RoseTTAFold (Baek et al., 2021) shipped from the Baker lab in parallel with a different “three-track” architecture and was equally consequential because it established that AlphaFold 2’s accuracy was not architecture-locked.

MSA-Free Prediction (ESMFold and OmegaFold)

For orphan proteins (no homologs), de novo designs (no evolutionary signal), or speed-critical workflows, MSA generation is the bottleneck. Two responses emerged:

ESMFold (Lin et al., 2023). Meta’s ESM-2, a 15-billion-parameter protein language model, predicts structure from sequence alone. Accuracy is below AlphaFold 2 on average but orders of magnitude faster when MSAs are weak or expensive.
OmegaFold (Wu et al., 2022, preprint). Single-sequence prediction with comparable goals, useful particularly for designed proteins.

The practical rule: if your sequence has a deep MSA, prefer AlphaFold 2 or RoseTTAFold; if it doesn’t, ESMFold or OmegaFold may be the only honest option.

Complex and Interaction Prediction

AlphaFold-Multimer (Evans et al., 2021) extended AlphaFold 2 to multi-chain protein complexes, introducing the ipTM (interface predicted TM-score) metric for interface confidence.
AlphaFold 3 (Abramson et al., 2024) replaced the deterministic structure module with a diffusion-based generative head and extended the chemical scope to nucleic acids, ions, modified residues, and arbitrary small-molecule ligands.
Boltz-1 (Wohlwend et al., 2024, preprint) and Chai-1 (Chai Discovery et al., 2024, preprint) reproduced AlphaFold 3-class capabilities with permissive open-source licenses.
Boltz-2 (Passaro et al., 2025, preprint) added binding-affinity prediction alongside structure for the same chemistry.

Complex prediction has independent support, but the claim is narrower than “AlphaFold solves interactions.” Bryant and colleagues showed that AlphaFold2-style modeling can improve protein-protein interaction prediction when paired with appropriate ranking and confidence filters (Bryant et al., 2022). Interface confidence, stoichiometry, cellular localization, and expression context still need separate biological checks.

Why “AlphaFold 3-class” Matters Operationally

For a research program that needs on-premise inference, batch screening across millions of candidates, or commercial-use rights, AlphaFold 3’s restricted release is a blocker. Boltz and Chai are not “AlphaFold 3 alternatives” in a marketing sense: they are the actually-usable instances of the capability for most production settings. Read the licenses before scoping a project.

Reading Confidence Metrics

A confident-looking AlphaFold cartoon is not evidence. The confidence outputs are.

pLDDT (predicted Local Distance Difference Test)

A per-residue score, 0 to 100, predicting how well the local atomic environment matches what an experimental structure would show. The DeepMind-recommended bands are:

Band	Interpretation	Practical Use
> 90	Very high	Side-chain placement usable for docking, mutagenesis design
70-90	Confident	Backbone reliable; side chains often correct; treat ligand interactions cautiously
50-70	Low	Likely the correct fold; do not trust local details
< 50	Very low	Often intrinsically disordered; treat as “no structural prediction”

Disordered regions are a feature of biology, not a model failure. AlphaFold’s low-pLDDT regions correlate strongly with experimentally determined disorder (Akdel et al., 2022).

PAE (Predicted Aligned Error)

A matrix giving expected error in residue j when the structure is aligned on residue i. The PAE matrix encodes domain structure and inter-domain confidence:

Block-diagonal low-PAE regions: Confident domains
Off-block-diagonal low-PAE regions: Confident relative orientation between domains
Off-block-diagonal high-PAE regions: Domain orientations are essentially unconstrained: the prediction is not a single global geometry, it is one plausible arrangement

The most common AlphaFold misinterpretation is treating a high-pLDDT multi-domain prediction as a confident full-length structure when the PAE matrix says inter-domain orientation is undetermined.

ipTM (Interface predicted TM-score): Complexes Only

For multi-chain predictions:

ipTM > 0.8: Interface likely correct
ipTM 0.6-0.8: Interface plausible, treat geometry as a hypothesis
ipTM < 0.6: Do not commit experimental design to this complex prediction

Combine with pTM (overall) and chain-pair ipTM when available.

What Structure Prediction Does Not Solve

Capability	Status	Why
Single-chain stable fold	Largely solved for well-folded domains	AlphaFold 2 + AFDB cover the majority of the structural proteome
Conformational ensembles	Open problem	A prediction is one state; many proteins occupy multiple functionally distinct conformations
Apo vs. holo geometry	Partially	Predictions tend to bias toward training distribution; cryptic pockets that only open with ligand are systematically missed
Allosteric and signaling states	Open	Activation states (GPCRs, kinases, ion channels) require additional context: sometimes co-folding with binding partners
Intrinsically disordered regions	Predictable as disordered, not as structures	The model honestly reports low pLDDT; reading this as “wrong” is the user’s error
Ligand chemistry beyond training	Limited	AF3-class systems extend chemical scope but novel ligand classes, covalent binders, and metal coordination remain hard
Membrane environments	Partial	Predictions can be plausible without an explicit membrane; lipid interactions and oligomeric state in-membrane often need experimental support
Hydrogen positions and water networks	No	Out of scope for these systems
Free-energy landscapes and kinetics	No	Static structures are not energetics; coupling with MD remains necessary
Mechanism	No	A structure is geometry, not function. Mechanism needs perturbation evidence.

A Specific Trap: “AlphaFold-Validated”

A vendor claim that a molecule was “AlphaFold-validated” against a target is, on its face, meaningless. AlphaFold predicts structure; it does not validate molecules. The honest version of the claim is: “we docked our molecule against an AlphaFold-predicted structure of the target and observed favorable scoring.” That claim, in turn, is one input to a validation campaign: not a substitute for binding, cell-based, and pharmacological evidence.

Decision Framework: Choosing a System

Situation	First Choice	Why
Single-chain protein with deep MSA, decision-relevant	AlphaFold 2 (or RoseTTAFold for open code)	Highest accuracy on the canonical task
Orphan / designed / fast-iteration single-chain	ESMFold (or OmegaFold)	MSA-free, fast
Protein-protein complex, moderate confidence acceptable	AlphaFold-Multimer	Established, well-characterized failure modes
Protein + ligand / NA / ion, cloud-acceptable	AlphaFold 3 web server	Highest-published accuracy on the AF3 task set
Protein + ligand / NA / ion, on-premise required	Boltz-1 or Chai-1	Open-source, AF3-class, deployable
Binding-affinity ranking matters	Boltz-2 (with caution and orthogonal validation)	Adds affinity head; still vendor-class evidence
Antibody-antigen interface	AlphaFold 3 / Boltz / Chai with explicit caveats	Antibody loops remain a known weakness for all systems
Membrane protein in lipid environment	Any of the above + MD + experimental data	No system claims to handle membrane partitioning natively

The deeper rule: choose the system by the biology and the access constraints, not by the brand of the model.

The AlphaFold 3 Open-Access Episode

AlphaFold 3 was published in Nature in May 2024 (Abramson et al., 2024). The initial release made the model available through a hosted server with rate limits, restricted commercial use, and no training code. The structural biology and ML communities pushed back, both on reproducibility grounds (a Nature paper without runnable training code is a hard precedent) and on operational grounds (research programs cannot use a server-only model for batch screening, IP-sensitive work, or air-gapped environments).

The response was infrastructural, not rhetorical:

Chai-1 (October 2024, Chai Discovery et al., 2024, preprint): Open-source AF3-class biomolecular interaction model.
Boltz-1 (November 2024, Wohlwend et al., 2024, preprint): MIT release with permissive license; positioned explicitly as democratizing AF3-class capability.
Boltz-2 (2025, Passaro et al., 2025, preprint): Adds binding-affinity prediction.
OpenFold continued evolution as an open-source AF2-class system.

In October 2024, the Nobel Prize in Chemistry was awarded jointly to Demis Hassabis and John Jumper for AlphaFold and to David Baker for computational protein design. Coverage in Nature’s news (Naddaf, 2025) and Engineering commentary (Palmer, 2025) chronicled both the scientific recognition and the open-access debate.

The operational lesson for research programs: licensing and code availability are first-class evaluation criteria, not afterthoughts. A model that cannot run in your environment is not usable for your program.

Confidence-Calibration Failure Modes

A list of pitfalls that recur in practice:

High pLDDT, wrong orientation. Two domains can each be confidently predicted while the PAE matrix shows inter-domain orientation is undetermined. The cartoon will look definitive; the science isn’t.
High ipTM, wrong interface in cells. A confident complex prediction can correspond to an interaction that does not occur at endogenous expression levels, in the relevant cellular compartment, or in the presence of competitor partners.
Training-distribution overlap. AlphaFold 2 was trained on PDB structures available at the time. Targets with close homologs in the training set are predicted with higher confidence than truly novel targets. Always check for near-neighbors in PDB before trusting a high score on a “novel” target.
Hallucinated ligand poses. AF3-class systems can produce confident-looking ligand placements for chemistry that is out-of-distribution. Treat ligand pose predictions as docking hypotheses, not as bound structures.
Disorder mis-read. Low-pLDDT regions are not “wrong”: they are predictions of disorder. Removing them and re-rendering the structure is a self-deception.
Single state, multi-state biology. GPCRs, kinases, ion channels, allosteric enzymes, and any protein with a meaningful conformational cycle are not described by a single structure, predicted or experimental.

Downstream Applications and Their Limits

Predicted structures feed into:

Variant effect prediction. AlphaMissense (Cheng et al., 2023) used AlphaFold-derived features to classify approximately 89% of human missense variants of unknown significance as likely benign or likely pathogenic. The classifier is a research tool; clinical use requires the standard regulatory pathway and independent validation. See Variant Effect Prediction.
Protein design. RFdiffusion (Watson et al., 2023) and related systems generate sequences that fold to specified geometries. AlphaFold-predicted structures of the designs are an iteration-loop component, not a substitute for experimental characterization. See Protein Design and Engineering.
Ligand and cofactor transfer. AlphaFill enriches AlphaFold models by transferring ligands and cofactors from homologous experimental structures, which is useful for hypothesis generation but not equivalent to a measured bound structure (Hekkelman et al., 2023).
Structure-based drug design. Predicted structures support docking and pocket analysis but inherit all the conformational and ligand-chemistry limitations above. See Small Molecule Generation and ADMET.
Functional annotation. Predicted folds can suggest functional class by structural homology: useful for orphan proteins, but not a substitute for biochemical or genetic validation.

Open Technical Directions

Two problems still matter:

Generative protein modeling at scale. ESM-3 (Hayes et al., 2025), from EvolutionaryScale, is a multi-modal generative protein language model that models sequence, structure, and function in one architecture. The relevance to prediction (as opposed to design) is the convergence: structure-prediction models and design models share a representational substrate. A practical implication is that future “prediction” workflows will increasingly produce ensembles and conditional generations, not single structures.
Affinity-aware interaction models. Boltz-2’s affinity head (Passaro et al., 2025, preprint) is an early example of folding-class models reaching into binding-affinity territory. Treat early claims with the same skepticism applied to any structure-then-ranking pipeline: docking-score correlation with experimental affinity is weak across docking studies.

Common Questions

Did AlphaFold “solve protein folding”?

For the operational definition Moult and CASP used: predict static three-dimensional structure from sequence for a typical well-folded protein domain at experimental accuracy: yes, for the majority of cases at CASP14 (Jumper et al., 2021). For the biological problem people sometimes mean: predict function, mechanism, conformational dynamics, and binding from sequence: no. The capability change is substantial; the framing matters.

Should I still pursue experimental structure determination?

Yes, for state-specific structures, complexes that AlphaFold’s confidence flags as uncertain, ligand-bound geometries that matter for chemistry, and any structure that will appear in a regulatory submission. Predicted structures are excellent first drafts and excellent screening tools. They are not substitutes for cryo-EM, X-ray, or NMR data that will support a publication or a development candidate.

Is AlphaFold 3 actually better than AlphaFold 2 for proteins-only tasks?

For single-chain protein-only predictions, AlphaFold 2 remains the well-characterized baseline. AlphaFold 3’s value is in the expanded chemical scope (NAs, ligands, ions, modifications). For a single-chain monomer where you do not need that scope, AlphaFold 2 is the simpler, better-understood, more reproducible choice.

How do I cite AlphaFold predictions in a paper?

Cite the underlying paper for the system used (Jumper et al., 2021; Abramson et al., 2024), the database paper if using AFDB models (Varadi et al., 2024; Varadi et al., 2022), and report version and confidence metrics (pLDDT distribution, PAE for relevant residue pairs) in the methods section. Treat predicted structures the way you would treat any computational result that informed an experimental design.

Can I use AlphaFold 3 commercially?

Read the current license before assuming. The terms have changed since the May 2024 release. If commercial use is essential, the open-source AF3-class alternatives (Boltz-1, Chai-1) have permissive licenses that are easier to scope around.

Is structure prediction the same as protein design?

No. Prediction takes sequence and asks “what does this fold to?” Design takes a target geometry or function and asks “what sequence achieves this?” The systems share representational machinery and design pipelines often use prediction as a quality-control filter, but the validation evidence required is different. See Protein Design and Engineering.

What is demonstrated?

Single-chain structure prediction for many well-folded protein domains is the best-established demonstrated capability. AlphaFold 2, RoseTTAFold, ESMFold, and related systems make predicted structures routine research inputs, and the AlphaFold Protein Structure Database shows the capability at proteome scale. AlphaFold 3, Boltz, and Chai extend the demonstrated scope to biomolecular interactions, but with narrower confidence guarantees and greater dependence on ligand chemistry, interface quality, and access terms.

What is theoretical?

Theoretical claims include dependable ensemble prediction, reliable affinity ranking, routine induced-fit pocket discovery, and structure prediction that selects the correct biological state for a program without extra context. These are plausible directions because current systems already encode useful structural priors, but the evidence is not strong enough to treat static predicted coordinates as mechanism, binding thermodynamics, or state-specific biology.

What is beyond current capability?

Beyond current capability includes replacing experimental structural biology for mechanism, regulatory filings, conformational cycles, ligand-bound geometry that drives chemistry, and water or protonation networks. Protein structure prediction supplies a hypothesis about geometry. It does not solve function, kinetics, free energy, cellular localization, expression, or pharmacology.

What would make this more promising?

Structure prediction becomes more promising with blinded benchmarks and prospective experiments that test complexes, ligands, conformational states, and affinity ranking against unseen biology and chemistry. Stronger evidence would include calibrated confidence for ligand poses, independent reproduction of AF3-class open models, and program-level reports showing that predicted structures change chemistry or mutagenesis decisions with measured success. Static benchmark accuracy alone is not enough for those claims.

What should researchers, biotech teams, funders, and program leaders do with this?

Read pLDDT and PAE before you read the cartoon. Pretty structures persuade; metrics protect you.
For consequential decisions, compute multiple predictions (different seeds, MSA depths, or systems) and look at consensus. A prediction that flips under reasonable perturbation is a low-confidence prediction regardless of pLDDT.
For drug-discovery work, do not commit chemistry to a predicted pocket without (a) an experimental structure of a homolog or (b) explicit biochemical evidence that the pocket exists. Cryptic pockets are systematically under-represented in predicted structures.
For antibody-antigen work, treat any single prediction as a hypothesis-grade interface until validated by mutagenesis, alanine scanning, or epitope mapping.
For conformational-cycle proteins (GPCRs, kinases, transporters), predict multiple states (different templates, conformation-specific MSAs) and validate against any available state-specific experimental structures before functional interpretation.
License diligence is part of method selection. Server-only models are not a deployable infrastructure; permissive open-source releases (Boltz, Chai, OpenFold) often are.
Cite the underlying paper and the model version. “AlphaFold” without a version is ambiguous; 2, 2.3, 3, and Multimer are different systems with different scope and different licenses.

Cross-References

AI for the Life Sciences: Evidence framework and chapter reference and utility standard
Foundation Models for Biology: ESMFold and the protein language model lineage
Protein Design and Engineering: RFdiffusion, generation pipelines, and design-validate-iterate loops
Antibody and Biologic Design: Antibody-specific limitations of generic structure models
Variant Effect Prediction: AlphaMissense and structure-informed variant classification
Small Molecule Generation and ADMET: Structure-based design downstream of structure prediction
Evaluation Principles for Life Sciences AI: How to design a validation plan that turns a prediction into evidence
Benchmarks for Bio AI: CASP, CAMEO, and the limits of static benchmarks