Protein Structure Prediction

Author
Published

May 24, 2026

Protein structure prediction crossed a research threshold at CASP14, when AlphaFold 2 produced single-chain predictions whose median backbone accuracy was comparable to experimental structures (Jumper et al., 2021). Three years later, the AlphaFold Protein Structure Database covers over 214 million predicted structures (Varadi et al., 2023): effectively the predicted structural proteome of life. The hard questions are no longer whether a model can fold a protein; they are which molecular state the prediction represents, what ligand and partner geometry it captures, and what an experiment still has to prove.

Learning Objectives

This chapter treats protein structure prediction as an experimental decision aid, not a substitute for measurement. You will learn to:

  • Distinguish single-chain prediction from biomolecular-interaction prediction, and explain why the confidence guarantees differ
  • Read pLDDT, PAE, and ipTM as decision metrics rather than truth labels
  • Choose between AlphaFold 2/3, ESMFold, RoseTTAFold, OmegaFold, Boltz, and Chai based on the biological question and access constraints
  • Identify the failure modes that persist after high-confidence prediction: disorder, multiple conformations, induced fit, ligand chemistry, allostery
  • Frame structure prediction inside a validation plan (docking, mutagenesis, experimental structure, function assay) before committing program decisions
  • Evaluate vendor claims about “AlphaFold-powered” pipelines against published evidence
  • Navigate the AlphaFold 3 open-access controversy and the open-source alternatives (Boltz, Chai, OpenFold)

Prerequisites: AI for the Life Sciences recommended, Foundation Models for Biology helpful for the ESMFold and ESM-3 sections.

The Big Picture: Protein structure prediction is now a routine research input, not a research project. AlphaFold 2 effectively solved single-chain prediction for the majority of well-folded protein domains. AlphaFold 3 extended prediction to nucleic acids, ions, and small molecules, then released under restricted terms: which open-source successors (Boltz, Chai) have since matched. None of these systems remove the experiment; they change which experiment matters.

Model Landscape:

System Best For Open Code Reads MSA? Confidence Metrics
AlphaFold 2 (Jumper et al., 2021) Single-chain structure Yes Yes pLDDT, PAE
AlphaFold-Multimer (Evans et al., 2021) Protein complexes Yes Yes pLDDT, PAE, ipTM
AlphaFold 3 (Abramson et al., 2024) Proteins, NAs, ligands, ions Restricted Yes pLDDT, PAE, ipTM
RoseTTAFold (Baek et al., 2021) Open AF2-class alternative Yes Yes pLDDT, PAE
ESMFold (Lin et al., 2023) Fast, MSA-free single-chain Yes No (LM only) pLDDT, PAE
OmegaFold (Wu et al., 2022, preprint) Orphan and designed proteins Yes No pLDDT
Boltz-1 (Wohlwend et al., 2024, preprint) Open AF3-class interactions Yes (MIT) Yes pLDDT, PAE, ipTM
Boltz-2 (Passaro et al., 2025, preprint) Adds binding-affinity prediction Yes (MIT) Yes Affinity + structure
Chai-1 (Chai Discovery et al., 2024, preprint) Open AF3-class interactions Yes Yes pLDDT, PAE, ipTM

Confidence Metrics (Critical to Interpret Correctly):

  • pLDDT > 90: High confidence: side chains usable
  • pLDDT 70-90: Confident backbone, treat details cautiously
  • pLDDT 50-70: Low confidence: likely correct fold, details unreliable
  • pLDDT < 50: Unreliable, often intrinsically disordered
  • PAE matrix: Read inter-domain blocks separately: confident within domains does not mean confident relative orientation
  • ipTM > 0.8 (complexes): Interface is likely correct
  • ipTM < 0.5: Treat complex as a hypothesis, not a structure

What Structure Prediction Solves vs. What It Does Not:

Solved (for most proteins) Still Open
Fold topology Conformational ensembles
Domain boundaries Allosteric states
Stable side-chain packing Disordered regions
Many protein-protein interfaces Cryptic and induced-fit pockets
Crude apo geometry Ligand chemistry beyond training distribution
Initial homology models Water networks and protonation

Decision Framework for a Research Program:

  1. Identify the biological state you need. Apo? Holo? Active? Inhibited? Membrane-bound? A predicted structure is one state, not all states.
  2. Choose the system by question, not brand. Single-chain monomer with deep MSA → AlphaFold 2. Antibody-antigen → AlphaFold 3 or Boltz. Designed scaffold with no homologs → OmegaFold or ESMFold. On-premise pipeline at scale → Boltz or Chai.
  3. Read confidence before pretty pictures. pLDDT and PAE are not optional.
  4. Plan the validating experiment up front. Mutagenesis, binding assay, low-resolution experimental structure, HDX-MS, cryo-EM, or X-ray: pick the one that will move the program decision.
  5. Document failure modes. Predicted geometry that disagrees with a measured constraint is information about the model’s limits, not noise.

The AlphaFold 3 Open-Access Controversy:

AlphaFold 3 was published in Nature in May 2024 but released initially as a server with rate limits and no training code, restricting reproducibility and commercial use. The community responded with open-source alternatives: Boltz-1 (MIT, November 2024), Chai-1 (Chai Discovery, October 2024), and ongoing OpenFold work. The Nobel Committee awarded the 2024 Chemistry Prize to Hassabis, Jumper, and Baker for protein structure prediction. The open-source backlash is the operationally relevant story for research programs that need on-premise inference (Naddaf, Nature news, 2025; Palmer, Engineering, 2025).

The Takeaway for Research Programs:

Structure prediction is now a commodity input, not a competitive advantage. The competitive advantage is the validation plan, the experimental followthrough, and the discipline to read confidence outputs honestly. A high-confidence AlphaFold structure of a protein you cannot express, purify, or assay is a slide, not a program. Treat every prediction as a hypothesis with a per-residue uncertainty budget, and design the experiment that turns the hypothesis into a decision.


Introduction: From Grand Challenge to Routine Input

December 2020, CASP14 (online): John Moult announces that AlphaFold 2 has achieved a median backbone accuracy of ~1 Å on the hardest free-modeling targets in the biennial Critical Assessment of protein Structure Prediction. The previous best on the same scale was several times worse. The protein-folding grand challenge, set 50 years earlier, is: in the sense Moult and others meant it: solved for most cases (Jumper et al., 2021).

July 2022: DeepMind and EMBL-EBI release the AlphaFold Protein Structure Database with predictions for approximately every cataloged UniProt protein. By the 2024 database update, coverage exceeds 214 million sequences (Varadi et al., 2023).

May 2024: AlphaFold 3 extends prediction to nucleic acids, ions, small-molecule ligands, and modified residues using a diffusion-based generative architecture (Abramson et al., 2024). Initial release is a web server with limits and without training code.

October 2024: The Royal Swedish Academy of Sciences awards the 2024 Nobel Prize in Chemistry jointly to Demis Hassabis and John Jumper (AlphaFold) and David Baker (computational protein design and Rosetta).

November 2024: MIT’s Barzilay and Jaakkola groups release Boltz-1, an open-source AlphaFold 3-class biomolecular interaction model (Wohlwend et al., 2024, preprint). The earlier open Chai-1 (Chai Discovery et al., 2024, preprint) had already established the community pattern.


These are not just modeling milestones. They are a change in the default operating assumption of structural biology: for many proteins, a usable structural model is available before any experiment is run.

The chapter that follows is organized around four questions a researcher actually faces:

  1. Which system should I use, and why?
  2. What does the confidence metric actually mean for my decision?
  3. What does the prediction not tell me?
  4. What experiment do I still owe before I commit a program decision?

The Architecture Lineage

Single-Chain Prediction (AlphaFold 2 and Successors)

AlphaFold 2 (Jumper et al., 2021) combined three ideas:

  • Multiple sequence alignment (MSA) input. Evolutionary covariation across homologs encodes structural constraints. AlphaFold 2 reads the MSA as a feature, not as input text.
  • Triangle-attention. Geometric self-consistency over pairwise residue features, iterated 48 times.
  • End-to-end differentiable structure module. A direct geometric prediction of backbone frames and side-chain rotamers, trained with structural and auxiliary losses.

The system was open-sourced shortly after publication. RoseTTAFold (Baek et al., 2021) shipped from the Baker lab in parallel with a different “three-track” architecture and was equally consequential because it established that AlphaFold 2’s accuracy was not architecture-locked.

MSA-Free Prediction (ESMFold and OmegaFold)

For orphan proteins (no homologs), de novo designs (no evolutionary signal), or speed-critical workflows, MSA generation is the bottleneck. Two responses emerged:

  • ESMFold (Lin et al., 2023). Meta’s ESM-2, a 15-billion-parameter protein language model, predicts structure from sequence alone. Accuracy is below AlphaFold 2 on average but orders of magnitude faster when MSAs are weak or expensive.
  • OmegaFold (Wu et al., 2022, preprint). Single-sequence prediction with comparable goals, useful particularly for designed proteins.

The practical rule: if your sequence has a deep MSA, prefer AlphaFold 2 or RoseTTAFold; if it doesn’t, ESMFold or OmegaFold may be the only honest option.

Complex and Interaction Prediction

  • AlphaFold-Multimer (Evans et al., 2021) extended AlphaFold 2 to multi-chain protein complexes, introducing the ipTM (interface predicted TM-score) metric for interface confidence.
  • AlphaFold 3 (Abramson et al., 2024) replaced the deterministic structure module with a diffusion-based generative head and extended the chemical scope to nucleic acids, ions, modified residues, and arbitrary small-molecule ligands.
  • Boltz-1 (Wohlwend et al., 2024, preprint) and Chai-1 (Chai Discovery et al., 2024, preprint) reproduced AlphaFold 3-class capabilities with permissive open-source licenses.
  • Boltz-2 (Passaro et al., 2025, preprint) added binding-affinity prediction alongside structure for the same chemistry.
Why “AlphaFold 3-class” Matters Operationally

For a research program that needs on-premise inference, batch screening across millions of candidates, or commercial-use rights, AlphaFold 3’s restricted release is a blocker. Boltz and Chai are not “AlphaFold 3 alternatives” in a marketing sense: they are the actually-usable instances of the capability for most production settings. Read the licenses before scoping a project.


Reading Confidence Metrics

A confident-looking AlphaFold cartoon is not evidence. The confidence outputs are.

pLDDT (predicted Local Distance Difference Test)

A per-residue score, 0 to 100, predicting how well the local atomic environment matches what an experimental structure would show. The DeepMind-recommended bands are:

Band Interpretation Practical Use
> 90 Very high Side-chain placement usable for docking, mutagenesis design
70-90 Confident Backbone reliable; side chains often correct; treat ligand interactions cautiously
50-70 Low Likely the correct fold; do not trust local details
< 50 Very low Often intrinsically disordered; treat as “no structural prediction”

Disordered regions are a feature of biology, not a model failure. AlphaFold’s low-pLDDT regions correlate strongly with experimentally determined disorder (Akdel et al., Nat Struct Mol Biol 2022: verify before citing in publication-bound work; this paper is widely referenced in the structure community for systematic AlphaFold 2 evaluation).

PAE (Predicted Aligned Error)

A matrix giving expected error in residue j when the structure is aligned on residue i. The PAE matrix encodes domain structure and inter-domain confidence:

  • Block-diagonal low-PAE regions: Confident domains
  • Off-block-diagonal low-PAE regions: Confident relative orientation between domains
  • Off-block-diagonal high-PAE regions: Domain orientations are essentially unconstrained: the prediction is not a single global geometry, it is one plausible arrangement

The most common AlphaFold misinterpretation is treating a high-pLDDT multi-domain prediction as a confident full-length structure when the PAE matrix says inter-domain orientation is undetermined.

ipTM (Interface predicted TM-score): Complexes Only

For multi-chain predictions:

  • ipTM > 0.8: Interface likely correct
  • ipTM 0.6-0.8: Interface plausible, treat geometry as a hypothesis
  • ipTM < 0.6: Do not commit experimental design to this complex prediction

Combine with pTM (overall) and chain-pair ipTM when available.


What Structure Prediction Does Not Solve

Capability Status Why
Single-chain stable fold Largely solved for well-folded domains AlphaFold 2 + AFDB cover the majority of the structural proteome
Conformational ensembles Open problem A prediction is one state; many proteins occupy multiple functionally distinct conformations
Apo vs. holo geometry Partially Predictions tend to bias toward training distribution; cryptic pockets that only open with ligand are systematically missed
Allosteric and signaling states Open Activation states (GPCRs, kinases, ion channels) require additional context: sometimes co-folding with binding partners
Intrinsically disordered regions Predictable as disordered, not as structures The model honestly reports low pLDDT; reading this as “wrong” is the user’s error
Ligand chemistry beyond training Limited AF3-class systems extend chemical scope but novel ligand classes, covalent binders, and metal coordination remain hard
Membrane environments Partial Predictions can be plausible without an explicit membrane; lipid interactions and oligomeric state in-membrane often need experimental support
Hydrogen positions and water networks No Out of scope for these systems
Free-energy landscapes and kinetics No Static structures are not energetics; coupling with MD remains necessary
Mechanism No A structure is geometry, not function. Mechanism needs perturbation evidence.
A Specific Trap: “AlphaFold-Validated”

A vendor claim that a molecule was “AlphaFold-validated” against a target is, on its face, meaningless. AlphaFold predicts structure; it does not validate molecules. The honest version of the claim is: “we docked our molecule against an AlphaFold-predicted structure of the target and observed favorable scoring.” That claim, in turn, is one input to a validation campaign: not a substitute for binding, cell-based, and pharmacological evidence.


Decision Framework: Choosing a System

Situation First Choice Why
Single-chain protein with deep MSA, decision-relevant AlphaFold 2 (or RoseTTAFold for open code) Highest accuracy on the canonical task
Orphan / designed / fast-iteration single-chain ESMFold (or OmegaFold) MSA-free, fast
Protein-protein complex, moderate confidence acceptable AlphaFold-Multimer Established, well-characterized failure modes
Protein + ligand / NA / ion, cloud-acceptable AlphaFold 3 web server Highest-published accuracy on the AF3 task set
Protein + ligand / NA / ion, on-premise required Boltz-1 or Chai-1 Open-source, AF3-class, deployable
Binding-affinity ranking matters Boltz-2 (with caution and orthogonal validation) Adds affinity head; still vendor-class evidence
Antibody-antigen interface AlphaFold 3 / Boltz / Chai with explicit caveats Antibody loops remain a known weakness for all systems
Membrane protein in lipid environment Any of the above + MD + experimental data No system claims to handle membrane partitioning natively

The deeper rule: choose the system by the biology and the access constraints, not by the brand of the model.


The AlphaFold 3 Open-Access Episode

AlphaFold 3 was published in Nature in May 2024 (Abramson et al., 2024). The initial release made the model available through a hosted server with rate limits, restricted commercial use, and no training code. The structural biology and ML communities pushed back, both on reproducibility grounds (a Nature paper without runnable training code is a hard precedent) and on operational grounds (research programs cannot use a server-only model for batch screening, IP-sensitive work, or air-gapped environments).

The response was infrastructural, not rhetorical:

In October 2024, the Nobel Prize in Chemistry was awarded jointly to Demis Hassabis and John Jumper for AlphaFold and to David Baker for computational protein design. Coverage in Nature’s news (Naddaf, 2025) and Engineering commentary (Palmer, 2025) chronicled both the scientific recognition and the open-access debate.

The operational lesson for research programs: licensing and code availability are first-class evaluation criteria, not afterthoughts. A model that cannot run in your environment is, for your program, not the state of the art.


Confidence-Calibration Failure Modes

A list of pitfalls that recur in practice:

  1. High pLDDT, wrong orientation. Two domains can each be confidently predicted while the PAE matrix shows inter-domain orientation is undetermined. The cartoon will look definitive; the science isn’t.
  2. High ipTM, wrong interface in cells. A confident complex prediction can correspond to an interaction that does not occur at endogenous expression levels, in the relevant cellular compartment, or in the presence of competitor partners.
  3. Training-distribution overlap. AlphaFold 2 was trained on PDB structures available at the time. Targets with close homologs in the training set are predicted with higher confidence than truly novel targets. Always check for near-neighbors in PDB before trusting a high score on a “novel” target.
  4. Hallucinated ligand poses. AF3-class systems can produce confident-looking ligand placements for chemistry that is out-of-distribution. Treat ligand pose predictions as docking hypotheses, not as bound structures.
  5. Disorder mis-read. Low-pLDDT regions are not “wrong”: they are predictions of disorder. Removing them and re-rendering the structure is a self-deception.
  6. Single state, multi-state biology. GPCRs, kinases, ion channels, allosteric enzymes, and any protein with a meaningful conformational cycle are not described by a single structure, predicted or experimental.

Downstream Applications and Their Limits

Predicted structures feed into:

  • Variant effect prediction. AlphaMissense (Cheng et al., 2023) used AlphaFold-derived features to classify approximately 89% of human missense variants of unknown significance as likely benign or likely pathogenic. The classifier is a research tool; clinical use requires the standard regulatory pathway and independent validation. See Variant Effect Prediction.
  • Protein design. RFdiffusion (Watson et al., 2023) and related systems generate sequences that fold to specified geometries. AlphaFold-predicted structures of the designs are an iteration-loop component, not a substitute for experimental characterization. See Protein Design and Engineering.
  • Structure-based drug design. Predicted structures support docking and pocket analysis but inherit all the conformational and ligand-chemistry limitations above. See Small Molecule Generation and ADMET.
  • Functional annotation. Predicted folds can suggest functional class by structural homology: useful for orphan proteins, but not a substitute for biochemical or genetic validation.

Frontier Directions

Two directions are worth tracking:

  • Generative protein modeling at scale. ESM-3 (Hayes et al., 2024, preprint), from EvolutionaryScale, is a multi-modal generative protein language model reasoning over sequence, structure, and function simultaneously. The relevance to prediction (as opposed to design) is the convergence: structure-prediction models and design models share a representational substrate. A practical implication is that future “prediction” workflows will increasingly produce ensembles and conditional generations, not single structures.
  • Affinity-aware interaction models. Boltz-2’s affinity head (Passaro et al., 2025, preprint) is an early example of folding-class models reaching into binding-affinity territory. Treat early claims with the same skepticism applied to any structure-then-ranking pipeline: docking-score correlation with experimental affinity is famously weak across the docking literature.

Practice Notes

  • Read pLDDT and PAE before you read the cartoon. Pretty structures persuade; metrics protect you.
  • For consequential decisions, compute multiple predictions (different seeds, MSA depths, or systems) and look at consensus. A prediction that flips under reasonable perturbation is a low-confidence prediction regardless of pLDDT.
  • For drug-discovery work, do not commit chemistry to a predicted pocket without (a) an experimental structure of a homolog or (b) explicit biochemical evidence that the pocket exists. Cryptic pockets are systematically under-represented in predicted structures.
  • For antibody-antigen work, treat any single prediction as a hypothesis-grade interface until validated by mutagenesis, alanine scanning, or epitope mapping.
  • For conformational-cycle proteins (GPCRs, kinases, transporters), predict multiple states (different templates, conformation-specific MSAs) and validate against any available state-specific experimental structures before functional interpretation.
  • License diligence is part of method selection. Server-only models are not a deployable infrastructure; permissive open-source releases (Boltz, Chai, OpenFold) often are.
  • Cite the underlying paper and the model version. “AlphaFold” without a version is ambiguous; 2, 2.3, 3, and Multimer are different systems with different scope and different licenses.

Common Questions

Did AlphaFold “solve protein folding”?

For the operational definition Moult and CASP used: predict static three-dimensional structure from sequence for a typical well-folded protein domain at experimental accuracy: yes, for the majority of cases at CASP14 (Jumper et al., 2021). For the biological problem people sometimes mean: predict function, mechanism, conformational dynamics, and binding from sequence: no. The capability change is substantial; the framing matters.

Should I still pursue experimental structure determination?

Yes, for state-specific structures, complexes that AlphaFold’s confidence flags as uncertain, ligand-bound geometries that matter for chemistry, and any structure that will appear in a regulatory submission. Predicted structures are excellent first drafts and excellent screening tools. They are not substitutes for cryo-EM, X-ray, or NMR data that will support a publication or a development candidate.

Is AlphaFold 3 actually better than AlphaFold 2 for proteins-only tasks?

For single-chain protein-only predictions, AlphaFold 2 remains the well-characterized baseline. AlphaFold 3’s value is in the expanded chemical scope (NAs, ligands, ions, modifications). For a single-chain monomer where you do not need that scope, AlphaFold 2 is the simpler, better-understood, more reproducible choice.

How do I cite AlphaFold predictions in a paper?

Cite the underlying paper for the system used (Jumper et al., 2021; Abramson et al., 2024), the database paper if using AFDB models (Varadi et al., 2023; Varadi et al., 2021), and report version and confidence metrics (pLDDT distribution, PAE for relevant residue pairs) in the methods section. Treat predicted structures the way you would treat any computational result that informed an experimental design.

Can I use AlphaFold 3 commercially?

Read the current license before assuming. The terms have been a moving target since the May 2024 release. If commercial use is essential, the open-source AF3-class alternatives (Boltz-1, Chai-1) have permissive licenses that are easier to scope around.

Is structure prediction the same as protein design?

No. Prediction takes sequence and asks “what does this fold to?” Design takes a target geometry or function and asks “what sequence achieves this?” The systems share representational machinery and design pipelines often use prediction as a quality-control filter, but the validation evidence required is different. See Protein Design and Engineering.


Cross-References