Protein Structure Prediction
Protein structure prediction crossed a research threshold at CASP14, when AlphaFold 2 produced single-chain predictions whose median backbone accuracy was comparable to experimental structures (Jumper et al., 2021). Three years later, the AlphaFold Protein Structure Database covers over 214 million predicted structures (Varadi et al., 2023): effectively the predicted structural proteome of life. The hard questions are no longer whether a model can fold a protein; they are which molecular state the prediction represents, what ligand and partner geometry it captures, and what an experiment still has to prove.
This chapter treats protein structure prediction as an experimental decision aid, not a substitute for measurement. You will learn to:
- Distinguish single-chain prediction from biomolecular-interaction prediction, and explain why the confidence guarantees differ
- Read pLDDT, PAE, and ipTM as decision metrics rather than truth labels
- Choose between AlphaFold 2/3, ESMFold, RoseTTAFold, OmegaFold, Boltz, and Chai based on the biological question and access constraints
- Identify the failure modes that persist after high-confidence prediction: disorder, multiple conformations, induced fit, ligand chemistry, allostery
- Frame structure prediction inside a validation plan (docking, mutagenesis, experimental structure, function assay) before committing program decisions
- Evaluate vendor claims about “AlphaFold-powered” pipelines against published evidence
- Navigate the AlphaFold 3 open-access controversy and the open-source alternatives (Boltz, Chai, OpenFold)
Prerequisites: AI for the Life Sciences recommended, Foundation Models for Biology helpful for the ESMFold and ESM-3 sections.
Introduction: From Grand Challenge to Routine Input
December 2020, CASP14 (online): John Moult announces that AlphaFold 2 has achieved a median backbone accuracy of ~1 Å on the hardest free-modeling targets in the biennial Critical Assessment of protein Structure Prediction. The previous best on the same scale was several times worse. The protein-folding grand challenge, set 50 years earlier, is: in the sense Moult and others meant it: solved for most cases (Jumper et al., 2021).
July 2022: DeepMind and EMBL-EBI release the AlphaFold Protein Structure Database with predictions for approximately every cataloged UniProt protein. By the 2024 database update, coverage exceeds 214 million sequences (Varadi et al., 2023).
May 2024: AlphaFold 3 extends prediction to nucleic acids, ions, small-molecule ligands, and modified residues using a diffusion-based generative architecture (Abramson et al., 2024). Initial release is a web server with limits and without training code.
October 2024: The Royal Swedish Academy of Sciences awards the 2024 Nobel Prize in Chemistry jointly to Demis Hassabis and John Jumper (AlphaFold) and David Baker (computational protein design and Rosetta).
November 2024: MIT’s Barzilay and Jaakkola groups release Boltz-1, an open-source AlphaFold 3-class biomolecular interaction model (Wohlwend et al., 2024, preprint). The earlier open Chai-1 (Chai Discovery et al., 2024, preprint) had already established the community pattern.
These are not just modeling milestones. They are a change in the default operating assumption of structural biology: for many proteins, a usable structural model is available before any experiment is run.
The chapter that follows is organized around four questions a researcher actually faces:
- Which system should I use, and why?
- What does the confidence metric actually mean for my decision?
- What does the prediction not tell me?
- What experiment do I still owe before I commit a program decision?
The Architecture Lineage
Single-Chain Prediction (AlphaFold 2 and Successors)
AlphaFold 2 (Jumper et al., 2021) combined three ideas:
- Multiple sequence alignment (MSA) input. Evolutionary covariation across homologs encodes structural constraints. AlphaFold 2 reads the MSA as a feature, not as input text.
- Triangle-attention. Geometric self-consistency over pairwise residue features, iterated 48 times.
- End-to-end differentiable structure module. A direct geometric prediction of backbone frames and side-chain rotamers, trained with structural and auxiliary losses.
The system was open-sourced shortly after publication. RoseTTAFold (Baek et al., 2021) shipped from the Baker lab in parallel with a different “three-track” architecture and was equally consequential because it established that AlphaFold 2’s accuracy was not architecture-locked.
MSA-Free Prediction (ESMFold and OmegaFold)
For orphan proteins (no homologs), de novo designs (no evolutionary signal), or speed-critical workflows, MSA generation is the bottleneck. Two responses emerged:
- ESMFold (Lin et al., 2023). Meta’s ESM-2, a 15-billion-parameter protein language model, predicts structure from sequence alone. Accuracy is below AlphaFold 2 on average but orders of magnitude faster when MSAs are weak or expensive.
- OmegaFold (Wu et al., 2022, preprint). Single-sequence prediction with comparable goals, useful particularly for designed proteins.
The practical rule: if your sequence has a deep MSA, prefer AlphaFold 2 or RoseTTAFold; if it doesn’t, ESMFold or OmegaFold may be the only honest option.
Complex and Interaction Prediction
- AlphaFold-Multimer (Evans et al., 2021) extended AlphaFold 2 to multi-chain protein complexes, introducing the ipTM (interface predicted TM-score) metric for interface confidence.
- AlphaFold 3 (Abramson et al., 2024) replaced the deterministic structure module with a diffusion-based generative head and extended the chemical scope to nucleic acids, ions, modified residues, and arbitrary small-molecule ligands.
- Boltz-1 (Wohlwend et al., 2024, preprint) and Chai-1 (Chai Discovery et al., 2024, preprint) reproduced AlphaFold 3-class capabilities with permissive open-source licenses.
- Boltz-2 (Passaro et al., 2025, preprint) added binding-affinity prediction alongside structure for the same chemistry.
For a research program that needs on-premise inference, batch screening across millions of candidates, or commercial-use rights, AlphaFold 3’s restricted release is a blocker. Boltz and Chai are not “AlphaFold 3 alternatives” in a marketing sense: they are the actually-usable instances of the capability for most production settings. Read the licenses before scoping a project.
Reading Confidence Metrics
A confident-looking AlphaFold cartoon is not evidence. The confidence outputs are.
pLDDT (predicted Local Distance Difference Test)
A per-residue score, 0 to 100, predicting how well the local atomic environment matches what an experimental structure would show. The DeepMind-recommended bands are:
| Band | Interpretation | Practical Use |
|---|---|---|
| > 90 | Very high | Side-chain placement usable for docking, mutagenesis design |
| 70-90 | Confident | Backbone reliable; side chains often correct; treat ligand interactions cautiously |
| 50-70 | Low | Likely the correct fold; do not trust local details |
| < 50 | Very low | Often intrinsically disordered; treat as “no structural prediction” |
Disordered regions are a feature of biology, not a model failure. AlphaFold’s low-pLDDT regions correlate strongly with experimentally determined disorder (Akdel et al., Nat Struct Mol Biol 2022: verify before citing in publication-bound work; this paper is widely referenced in the structure community for systematic AlphaFold 2 evaluation).
PAE (Predicted Aligned Error)
A matrix giving expected error in residue j when the structure is aligned on residue i. The PAE matrix encodes domain structure and inter-domain confidence:
- Block-diagonal low-PAE regions: Confident domains
- Off-block-diagonal low-PAE regions: Confident relative orientation between domains
- Off-block-diagonal high-PAE regions: Domain orientations are essentially unconstrained: the prediction is not a single global geometry, it is one plausible arrangement
The most common AlphaFold misinterpretation is treating a high-pLDDT multi-domain prediction as a confident full-length structure when the PAE matrix says inter-domain orientation is undetermined.
ipTM (Interface predicted TM-score): Complexes Only
For multi-chain predictions:
- ipTM > 0.8: Interface likely correct
- ipTM 0.6-0.8: Interface plausible, treat geometry as a hypothesis
- ipTM < 0.6: Do not commit experimental design to this complex prediction
Combine with pTM (overall) and chain-pair ipTM when available.
What Structure Prediction Does Not Solve
| Capability | Status | Why |
|---|---|---|
| Single-chain stable fold | Largely solved for well-folded domains | AlphaFold 2 + AFDB cover the majority of the structural proteome |
| Conformational ensembles | Open problem | A prediction is one state; many proteins occupy multiple functionally distinct conformations |
| Apo vs. holo geometry | Partially | Predictions tend to bias toward training distribution; cryptic pockets that only open with ligand are systematically missed |
| Allosteric and signaling states | Open | Activation states (GPCRs, kinases, ion channels) require additional context: sometimes co-folding with binding partners |
| Intrinsically disordered regions | Predictable as disordered, not as structures | The model honestly reports low pLDDT; reading this as “wrong” is the user’s error |
| Ligand chemistry beyond training | Limited | AF3-class systems extend chemical scope but novel ligand classes, covalent binders, and metal coordination remain hard |
| Membrane environments | Partial | Predictions can be plausible without an explicit membrane; lipid interactions and oligomeric state in-membrane often need experimental support |
| Hydrogen positions and water networks | No | Out of scope for these systems |
| Free-energy landscapes and kinetics | No | Static structures are not energetics; coupling with MD remains necessary |
| Mechanism | No | A structure is geometry, not function. Mechanism needs perturbation evidence. |
A vendor claim that a molecule was “AlphaFold-validated” against a target is, on its face, meaningless. AlphaFold predicts structure; it does not validate molecules. The honest version of the claim is: “we docked our molecule against an AlphaFold-predicted structure of the target and observed favorable scoring.” That claim, in turn, is one input to a validation campaign: not a substitute for binding, cell-based, and pharmacological evidence.
Decision Framework: Choosing a System
| Situation | First Choice | Why |
|---|---|---|
| Single-chain protein with deep MSA, decision-relevant | AlphaFold 2 (or RoseTTAFold for open code) | Highest accuracy on the canonical task |
| Orphan / designed / fast-iteration single-chain | ESMFold (or OmegaFold) | MSA-free, fast |
| Protein-protein complex, moderate confidence acceptable | AlphaFold-Multimer | Established, well-characterized failure modes |
| Protein + ligand / NA / ion, cloud-acceptable | AlphaFold 3 web server | Highest-published accuracy on the AF3 task set |
| Protein + ligand / NA / ion, on-premise required | Boltz-1 or Chai-1 | Open-source, AF3-class, deployable |
| Binding-affinity ranking matters | Boltz-2 (with caution and orthogonal validation) | Adds affinity head; still vendor-class evidence |
| Antibody-antigen interface | AlphaFold 3 / Boltz / Chai with explicit caveats | Antibody loops remain a known weakness for all systems |
| Membrane protein in lipid environment | Any of the above + MD + experimental data | No system claims to handle membrane partitioning natively |
The deeper rule: choose the system by the biology and the access constraints, not by the brand of the model.
The AlphaFold 3 Open-Access Episode
AlphaFold 3 was published in Nature in May 2024 (Abramson et al., 2024). The initial release made the model available through a hosted server with rate limits, restricted commercial use, and no training code. The structural biology and ML communities pushed back, both on reproducibility grounds (a Nature paper without runnable training code is a hard precedent) and on operational grounds (research programs cannot use a server-only model for batch screening, IP-sensitive work, or air-gapped environments).
The response was infrastructural, not rhetorical:
- Chai-1 (October 2024, Chai Discovery et al., 2024, preprint): Open-source AF3-class biomolecular interaction model.
- Boltz-1 (November 2024, Wohlwend et al., 2024, preprint): MIT release with permissive license; positioned explicitly as democratizing AF3-class capability.
- Boltz-2 (2025, Passaro et al., 2025, preprint): Adds binding-affinity prediction.
- OpenFold continued evolution as an open-source AF2-class system.
In October 2024, the Nobel Prize in Chemistry was awarded jointly to Demis Hassabis and John Jumper for AlphaFold and to David Baker for computational protein design. Coverage in Nature’s news (Naddaf, 2025) and Engineering commentary (Palmer, 2025) chronicled both the scientific recognition and the open-access debate.
The operational lesson for research programs: licensing and code availability are first-class evaluation criteria, not afterthoughts. A model that cannot run in your environment is, for your program, not the state of the art.
Confidence-Calibration Failure Modes
A list of pitfalls that recur in practice:
- High pLDDT, wrong orientation. Two domains can each be confidently predicted while the PAE matrix shows inter-domain orientation is undetermined. The cartoon will look definitive; the science isn’t.
- High ipTM, wrong interface in cells. A confident complex prediction can correspond to an interaction that does not occur at endogenous expression levels, in the relevant cellular compartment, or in the presence of competitor partners.
- Training-distribution overlap. AlphaFold 2 was trained on PDB structures available at the time. Targets with close homologs in the training set are predicted with higher confidence than truly novel targets. Always check for near-neighbors in PDB before trusting a high score on a “novel” target.
- Hallucinated ligand poses. AF3-class systems can produce confident-looking ligand placements for chemistry that is out-of-distribution. Treat ligand pose predictions as docking hypotheses, not as bound structures.
- Disorder mis-read. Low-pLDDT regions are not “wrong”: they are predictions of disorder. Removing them and re-rendering the structure is a self-deception.
- Single state, multi-state biology. GPCRs, kinases, ion channels, allosteric enzymes, and any protein with a meaningful conformational cycle are not described by a single structure, predicted or experimental.
Downstream Applications and Their Limits
Predicted structures feed into:
- Variant effect prediction. AlphaMissense (Cheng et al., 2023) used AlphaFold-derived features to classify approximately 89% of human missense variants of unknown significance as likely benign or likely pathogenic. The classifier is a research tool; clinical use requires the standard regulatory pathway and independent validation. See Variant Effect Prediction.
- Protein design. RFdiffusion (Watson et al., 2023) and related systems generate sequences that fold to specified geometries. AlphaFold-predicted structures of the designs are an iteration-loop component, not a substitute for experimental characterization. See Protein Design and Engineering.
- Structure-based drug design. Predicted structures support docking and pocket analysis but inherit all the conformational and ligand-chemistry limitations above. See Small Molecule Generation and ADMET.
- Functional annotation. Predicted folds can suggest functional class by structural homology: useful for orphan proteins, but not a substitute for biochemical or genetic validation.
Frontier Directions
Two directions are worth tracking:
- Generative protein modeling at scale. ESM-3 (Hayes et al., 2024, preprint), from EvolutionaryScale, is a multi-modal generative protein language model reasoning over sequence, structure, and function simultaneously. The relevance to prediction (as opposed to design) is the convergence: structure-prediction models and design models share a representational substrate. A practical implication is that future “prediction” workflows will increasingly produce ensembles and conditional generations, not single structures.
- Affinity-aware interaction models. Boltz-2’s affinity head (Passaro et al., 2025, preprint) is an early example of folding-class models reaching into binding-affinity territory. Treat early claims with the same skepticism applied to any structure-then-ranking pipeline: docking-score correlation with experimental affinity is famously weak across the docking literature.
Practice Notes
- Read pLDDT and PAE before you read the cartoon. Pretty structures persuade; metrics protect you.
- For consequential decisions, compute multiple predictions (different seeds, MSA depths, or systems) and look at consensus. A prediction that flips under reasonable perturbation is a low-confidence prediction regardless of pLDDT.
- For drug-discovery work, do not commit chemistry to a predicted pocket without (a) an experimental structure of a homolog or (b) explicit biochemical evidence that the pocket exists. Cryptic pockets are systematically under-represented in predicted structures.
- For antibody-antigen work, treat any single prediction as a hypothesis-grade interface until validated by mutagenesis, alanine scanning, or epitope mapping.
- For conformational-cycle proteins (GPCRs, kinases, transporters), predict multiple states (different templates, conformation-specific MSAs) and validate against any available state-specific experimental structures before functional interpretation.
- License diligence is part of method selection. Server-only models are not a deployable infrastructure; permissive open-source releases (Boltz, Chai, OpenFold) often are.
- Cite the underlying paper and the model version. “AlphaFold” without a version is ambiguous; 2, 2.3, 3, and Multimer are different systems with different scope and different licenses.
Common Questions
Did AlphaFold “solve protein folding”?
For the operational definition Moult and CASP used: predict static three-dimensional structure from sequence for a typical well-folded protein domain at experimental accuracy: yes, for the majority of cases at CASP14 (Jumper et al., 2021). For the biological problem people sometimes mean: predict function, mechanism, conformational dynamics, and binding from sequence: no. The capability change is substantial; the framing matters.
Should I still pursue experimental structure determination?
Yes, for state-specific structures, complexes that AlphaFold’s confidence flags as uncertain, ligand-bound geometries that matter for chemistry, and any structure that will appear in a regulatory submission. Predicted structures are excellent first drafts and excellent screening tools. They are not substitutes for cryo-EM, X-ray, or NMR data that will support a publication or a development candidate.
Is AlphaFold 3 actually better than AlphaFold 2 for proteins-only tasks?
For single-chain protein-only predictions, AlphaFold 2 remains the well-characterized baseline. AlphaFold 3’s value is in the expanded chemical scope (NAs, ligands, ions, modifications). For a single-chain monomer where you do not need that scope, AlphaFold 2 is the simpler, better-understood, more reproducible choice.
How do I cite AlphaFold predictions in a paper?
Cite the underlying paper for the system used (Jumper et al., 2021; Abramson et al., 2024), the database paper if using AFDB models (Varadi et al., 2023; Varadi et al., 2021), and report version and confidence metrics (pLDDT distribution, PAE for relevant residue pairs) in the methods section. Treat predicted structures the way you would treat any computational result that informed an experimental design.
Can I use AlphaFold 3 commercially?
Read the current license before assuming. The terms have been a moving target since the May 2024 release. If commercial use is essential, the open-source AF3-class alternatives (Boltz-1, Chai-1) have permissive licenses that are easier to scope around.
Is structure prediction the same as protein design?
No. Prediction takes sequence and asks “what does this fold to?” Design takes a target geometry or function and asks “what sequence achieves this?” The systems share representational machinery and design pipelines often use prediction as a quality-control filter, but the validation evidence required is different. See Protein Design and Engineering.
Cross-References
- AI for the Life Sciences: Evidence framework and the Demonstrated / Theoretical / Beyond tiers
- Foundation Models for Biology: ESMFold and the protein language model lineage
- Protein Design and Engineering: RFdiffusion, generation pipelines, and design-validate-iterate loops
- Antibody and Biologic Design: Antibody-specific limitations of generic structure models
- Variant Effect Prediction: AlphaMissense and structure-informed variant classification
- Small Molecule Generation and ADMET: Structure-based design downstream of structure prediction
- Evaluation Principles for Biomedical Discovery AI: How to design a validation plan that turns a prediction into a decision
- Benchmarks for Bio AI: CASP, CAMEO, and the limits of static benchmarks