The Life Sciences AI Handbook: AI for Biomedical Discovery, Biotechnology, and Translational Research

Name: The Life Sciences AI Handbook
Author: Bryan Tegomoh

Tegomoh, Bryan; [Bryan Tegomoh, MD, MPH](https://bryantegomoh.com/)

Protein Structure Prediction

Author

Bryan Tegomoh, MD, MPH

Published

May 24, 2026

Protein structure prediction crossed a research threshold at CASP14, when AlphaFold 2 produced single-chain predictions whose median backbone accuracy was comparable to experimental structures (Jumper et al., 2021). Three years later, the AlphaFold Protein Structure Database covers over 214 million predicted structures (Varadi et al., 2023): effectively the predicted structural proteome of life. The hard questions are no longer whether a model can fold a protein; they are which molecular state the prediction represents, what ligand and partner geometry it captures, and what an experiment still has to prove.

Learning Objectives

This chapter treats protein structure prediction as an experimental decision aid, not a substitute for measurement. You will learn to:

Distinguish single-chain prediction from biomolecular-interaction prediction, and explain why the confidence guarantees differ
Read pLDDT, PAE, and ipTM as decision metrics rather than truth labels
Choose between AlphaFold 2/3, ESMFold, RoseTTAFold, OmegaFold, Boltz, and Chai based on the biological question and access constraints
Identify the failure modes that persist after high-confidence prediction: disorder, multiple conformations, induced fit, ligand chemistry, allostery
Frame structure prediction inside a validation plan (docking, mutagenesis, experimental structure, function assay) before committing program decisions
Evaluate vendor claims about “AlphaFold-powered” pipelines against published evidence
Navigate the AlphaFold 3 open-access controversy and the open-source alternatives (Boltz, Chai, OpenFold)

Prerequisites: AI for the Life Sciences recommended, Foundation Models for Biology helpful for the ESMFold and ESM-3 sections.

Chapter Summary (TL;DR)

The Big Picture: Protein structure prediction is now a routine research input, not a research project. AlphaFold 2 effectively solved single-chain prediction for the majority of well-folded protein domains. AlphaFold 3 extended prediction to nucleic acids, ions, and small molecules, then released under restricted terms: which open-source successors (Boltz, Chai) have since matched. None of these systems remove the experiment; they change which experiment matters.

Model Landscape:

System	Best For	Open Code	Reads MSA?	Confidence Metrics
AlphaFold 2 (Jumper et al., 2021)	Single-chain structure	Yes	Yes	pLDDT, PAE
AlphaFold-Multimer (Evans et al., 2021)	Protein complexes	Yes	Yes	pLDDT, PAE, ipTM
AlphaFold 3 (Abramson et al., 2024)	Proteins, NAs, ligands, ions	Restricted	Yes	pLDDT, PAE, ipTM
RoseTTAFold (Baek et al., 2021)	Open AF2-class alternative	Yes	Yes	pLDDT, PAE
ESMFold (Lin et al., 2023)	Fast, MSA-free single-chain	Yes	No (LM only)	pLDDT, PAE
OmegaFold (Wu et al., 2022, preprint)	Orphan and designed proteins	Yes	No	pLDDT
Boltz-1 (Wohlwend et al., 2024, preprint)	Open AF3-class interactions	Yes (MIT)	Yes	pLDDT, PAE, ipTM
Boltz-2 (Passaro et al., 2025, preprint)	Adds binding-affinity prediction	Yes (MIT)	Yes	Affinity + structure
Chai-1 (Chai Discovery et al., 2024, preprint)	Open AF3-class interactions	Yes	Yes	pLDDT, PAE, ipTM

Confidence Metrics (Critical to Interpret Correctly):

pLDDT > 90: High confidence: side chains usable
pLDDT 70-90: Confident backbone, treat details cautiously
pLDDT 50-70: Low confidence: likely correct fold, details unreliable
pLDDT < 50: Unreliable, often intrinsically disordered
PAE matrix: Read inter-domain blocks separately: confident within domains does not mean confident relative orientation
ipTM > 0.8 (complexes): Interface is likely correct
ipTM < 0.5: Treat complex as a hypothesis, not a structure

What Structure Prediction Solves vs. What It Does Not:

Solved (for most proteins)	Still Open
Fold topology	Conformational ensembles
Domain boundaries	Allosteric states
Stable side-chain packing	Disordered regions
Many protein-protein interfaces	Cryptic and induced-fit pockets
Crude apo geometry	Ligand chemistry beyond training distribution
Initial homology models	Water networks and protonation

Decision Framework for a Research Program:

Identify the biological state you need. Apo? Holo? Active? Inhibited? Membrane-bound? A predicted structure is one state, not all states.
Choose the system by question, not brand. Single-chain monomer with deep MSA → AlphaFold 2. Antibody-antigen → AlphaFold 3 or Boltz. Designed scaffold with no homologs → OmegaFold or ESMFold. On-premise pipeline at scale → Boltz or Chai.
Read confidence before pretty pictures. pLDDT and PAE are not optional.
Plan the validating experiment up front. Mutagenesis, binding assay, low-resolution experimental structure, HDX-MS, cryo-EM, or X-ray: pick the one that will move the program decision.
Document failure modes. Predicted geometry that disagrees with a measured constraint is information about the model’s limits, not noise.

The AlphaFold 3 Open-Access Controversy:

AlphaFold 3 was published in Nature in May 2024 but released initially as a server with rate limits and no training code, restricting reproducibility and commercial use. The community responded with open-source alternatives: Boltz-1 (MIT, November 2024), Chai-1 (Chai Discovery, October 2024), and ongoing OpenFold work. The Nobel Committee awarded the 2024 Chemistry Prize to Hassabis, Jumper, and Baker for protein structure prediction. The open-source backlash is the operationally relevant story for research programs that need on-premise inference (Naddaf, Nature news, 2025; Palmer, Engineering, 2025).

The Takeaway for Research Programs:

Structure prediction is now a commodity input, not a competitive advantage. The competitive advantage is the validation plan, the experimental followthrough, and the discipline to read confidence outputs honestly. A high-confidence AlphaFold structure of a protein you cannot express, purify, or assay is a slide, not a program. Treat every prediction as a hypothesis with a per-residue uncertainty budget, and design the experiment that turns the hypothesis into a decision.

Introduction: From Grand Challenge to Routine Input

December 2020, CASP14 (online): John Moult announces that AlphaFold 2 has achieved a median backbone accuracy of ~1 Å on the hardest free-modeling targets in the biennial Critical Assessment of protein Structure Prediction. The previous best on the same scale was several times worse. The protein-folding grand challenge, set 50 years earlier, is: in the sense Moult and others meant it: solved for most cases (Jumper et al., 2021).

July 2022: DeepMind and EMBL-EBI release the AlphaFold Protein Structure Database with predictions for approximately every cataloged UniProt protein. By the 2024 database update, coverage exceeds 214 million sequences (Varadi et al., 2023).

May 2024: AlphaFold 3 extends prediction to nucleic acids, ions, small-molecule ligands, and modified residues using a diffusion-based generative architecture (Abramson et al., 2024). Initial release is a web server with limits and without training code.

October 2024: The Royal Swedish Academy of Sciences awards the 2024 Nobel Prize in Chemistry jointly to Demis Hassabis and John Jumper (AlphaFold) and David Baker (computational protein design and Rosetta).

November 2024: MIT’s Barzilay and Jaakkola groups release Boltz-1, an open-source AlphaFold 3-class biomolecular interaction model (Wohlwend et al., 2024, preprint). The earlier open Chai-1 (Chai Discovery et al., 2024, preprint) had already established the community pattern.

These are not just modeling milestones. They are a change in the default operating assumption of structural biology: for many proteins, a usable structural model is available before any experiment is run.

The chapter that follows is organized around four questions a researcher actually faces:

Which system should I use, and why?
What does the confidence metric actually mean for my decision?
What does the prediction not tell me?
What experiment do I still owe before I commit a program decision?

The Architecture Lineage

Single-Chain Prediction (AlphaFold 2 and Successors)

AlphaFold 2 (Jumper et al., 2021) combined three ideas:

Multiple sequence alignment (MSA) input. Evolutionary covariation across homologs encodes structural constraints. AlphaFold 2 reads the MSA as a feature, not as input text.
Triangle-attention. Geometric self-consistency over pairwise residue features, iterated 48 times.
End-to-end differentiable structure module. A direct geometric prediction of backbone frames and side-chain rotamers, trained with structural and auxiliary losses.

The system was open-sourced shortly after publication. RoseTTAFold (Baek et al., 2021) shipped from the Baker lab in parallel with a different “three-track” architecture and was equally consequential because it established that AlphaFold 2’s accuracy was not architecture-locked.

MSA-Free Prediction (ESMFold and OmegaFold)

For orphan proteins (no homologs), de novo designs (no evolutionary signal), or speed-critical workflows, MSA generation is the bottleneck. Two responses emerged:

ESMFold (Lin et al., 2023). Meta’s ESM-2, a 15-billion-parameter protein language model, predicts structure from sequence alone. Accuracy is below AlphaFold 2 on average but orders of magnitude faster when MSAs are weak or expensive.
OmegaFold (Wu et al., 2022, preprint). Single-sequence prediction with comparable goals, useful particularly for designed proteins.

The practical rule: if your sequence has a deep MSA, prefer AlphaFold 2 or RoseTTAFold; if it doesn’t, ESMFold or OmegaFold may be the only honest option.

Complex and Interaction Prediction

AlphaFold-Multimer (Evans et al., 2021) extended AlphaFold 2 to multi-chain protein complexes, introducing the ipTM (interface predicted TM-score) metric for interface confidence.
AlphaFold 3 (Abramson et al., 2024) replaced the deterministic structure module with a diffusion-based generative head and extended the chemical scope to nucleic acids, ions, modified residues, and arbitrary small-molecule ligands.
Boltz-1 (Wohlwend et al., 2024, preprint) and Chai-1 (Chai Discovery et al., 2024, preprint) reproduced AlphaFold 3-class capabilities with permissive open-source licenses.
Boltz-2 (Passaro et al., 2025, preprint) added binding-affinity prediction alongside structure for the same chemistry.

Why “AlphaFold 3-class” Matters Operationally

For a research program that needs on-premise inference, batch screening across millions of candidates, or commercial-use rights, AlphaFold 3’s restricted release is a blocker. Boltz and Chai are not “AlphaFold 3 alternatives” in a marketing sense: they are the actually-usable instances of the capability for most production settings. Read the licenses before scoping a project.

Reading Confidence Metrics

A confident-looking AlphaFold cartoon is not evidence. The confidence outputs are.

pLDDT (predicted Local Distance Difference Test)

A per-residue score, 0 to 100, predicting how well the local atomic environment matches what an experimental structure would show. The DeepMind-recommended bands are:

Band	Interpretation	Practical Use
> 90	Very high	Side-chain placement usable for docking, mutagenesis design
70-90	Confident	Backbone reliable; side chains often correct; treat ligand interactions cautiously
50-70	Low	Likely the correct fold; do not trust local details
< 50	Very low	Often intrinsically disordered; treat as “no structural prediction”

Disordered regions are a feature of biology, not a model failure. AlphaFold’s low-pLDDT regions correlate strongly with experimentally determined disorder (Akdel et al., Nat Struct Mol Biol 2022: verify before citing in publication-bound work; this paper is widely referenced in the structure community for systematic AlphaFold 2 evaluation).

PAE (Predicted Aligned Error)

A matrix giving expected error in residue j when the structure is aligned on residue i. The PAE matrix encodes domain structure and inter-domain confidence:

Block-diagonal low-PAE regions: Confident domains
Off-block-diagonal low-PAE regions: Confident relative orientation between domains
Off-block-diagonal high-PAE regions: Domain orientations are essentially unconstrained: the prediction is not a single global geometry, it is one plausible arrangement

The most common AlphaFold misinterpretation is treating a high-pLDDT multi-domain prediction as a confident full-length structure when the PAE matrix says inter-domain orientation is undetermined.

ipTM (Interface predicted TM-score): Complexes Only

For multi-chain predictions:

ipTM > 0.8: Interface likely correct
ipTM 0.6-0.8: Interface plausible, treat geometry as a hypothesis
ipTM < 0.6: Do not commit experimental design to this complex prediction

Combine with pTM (overall) and chain-pair ipTM when available.

What Structure Prediction Does Not Solve

Capability	Status	Why
Single-chain stable fold	Largely solved for well-folded domains	AlphaFold 2 + AFDB cover the majority of the structural proteome
Conformational ensembles	Open problem	A prediction is one state; many proteins occupy multiple functionally distinct conformations
Apo vs. holo geometry	Partially	Predictions tend to bias toward training distribution; cryptic pockets that only open with ligand are systematically missed
Allosteric and signaling states	Open	Activation states (GPCRs, kinases, ion channels) require additional context: sometimes co-folding with binding partners
Intrinsically disordered regions	Predictable as disordered, not as structures	The model honestly reports low pLDDT; reading this as “wrong” is the user’s error
Ligand chemistry beyond training	Limited	AF3-class systems extend chemical scope but novel ligand classes, covalent binders, and metal coordination remain hard
Membrane environments	Partial	Predictions can be plausible without an explicit membrane; lipid interactions and oligomeric state in-membrane often need experimental support
Hydrogen positions and water networks	No	Out of scope for these systems
Free-energy landscapes and kinetics	No	Static structures are not energetics; coupling with MD remains necessary
Mechanism	No	A structure is geometry, not function. Mechanism needs perturbation evidence.

A Specific Trap: “AlphaFold-Validated”

A vendor claim that a molecule was “AlphaFold-validated” against a target is, on its face, meaningless. AlphaFold predicts structure; it does not validate molecules. The honest version of the claim is: “we docked our molecule against an AlphaFold-predicted structure of the target and observed favorable scoring.” That claim, in turn, is one input to a validation campaign: not a substitute for binding, cell-based, and pharmacological evidence.

Decision Framework: Choosing a System

Situation	First Choice	Why
Single-chain protein with deep MSA, decision-relevant	AlphaFold 2 (or RoseTTAFold for open code)	Highest accuracy on the canonical task
Orphan / designed / fast-iteration single-chain	ESMFold (or OmegaFold)	MSA-free, fast
Protein-protein complex, moderate confidence acceptable	AlphaFold-Multimer	Established, well-characterized failure modes
Protein + ligand / NA / ion, cloud-acceptable	AlphaFold 3 web server	Highest-published accuracy on the AF3 task set
Protein + ligand / NA / ion, on-premise required	Boltz-1 or Chai-1	Open-source, AF3-class, deployable
Binding-affinity ranking matters	Boltz-2 (with caution and orthogonal validation)	Adds affinity head; still vendor-class evidence
Antibody-antigen interface	AlphaFold 3 / Boltz / Chai with explicit caveats	Antibody loops remain a known weakness for all systems
Membrane protein in lipid environment	Any of the above + MD + experimental data	No system claims to handle membrane partitioning natively

The deeper rule: choose the system by the biology and the access constraints, not by the brand of the model.

The AlphaFold 3 Open-Access Episode

AlphaFold 3 was published in Nature in May 2024 (Abramson et al., 2024). The initial release made the model available through a hosted server with rate limits, restricted commercial use, and no training code. The structural biology and ML communities pushed back, both on reproducibility grounds (a Nature paper without runnable training code is a hard precedent) and on operational grounds (research programs cannot use a server-only model for batch screening, IP-sensitive work, or air-gapped environments).

The response was infrastructural, not rhetorical:

Chai-1 (October 2024, Chai Discovery et al., 2024, preprint): Open-source AF3-class biomolecular interaction model.
Boltz-1 (November 2024, Wohlwend et al., 2024, preprint): MIT release with permissive license; positioned explicitly as democratizing AF3-class capability.
Boltz-2 (2025, Passaro et al., 2025, preprint): Adds binding-affinity prediction.
OpenFold continued evolution as an open-source AF2-class system.

In October 2024, the Nobel Prize in Chemistry was awarded jointly to Demis Hassabis and John Jumper for AlphaFold and to David Baker for computational protein design. Coverage in Nature’s news (Naddaf, 2025) and Engineering commentary (Palmer, 2025) chronicled both the scientific recognition and the open-access debate.

The operational lesson for research programs: licensing and code availability are first-class evaluation criteria, not afterthoughts. A model that cannot run in your environment is, for your program, not the state of the art.

Confidence-Calibration Failure Modes

A list of pitfalls that recur in practice:

High pLDDT, wrong orientation. Two domains can each be confidently predicted while the PAE matrix shows inter-domain orientation is undetermined. The cartoon will look definitive; the science isn’t.
High ipTM, wrong interface in cells. A confident complex prediction can correspond to an interaction that does not occur at endogenous expression levels, in the relevant cellular compartment, or in the presence of competitor partners.
Training-distribution overlap. AlphaFold 2 was trained on PDB structures available at the time. Targets with close homologs in the training set are predicted with higher confidence than truly novel targets. Always check for near-neighbors in PDB before trusting a high score on a “novel” target.
Hallucinated ligand poses. AF3-class systems can produce confident-looking ligand placements for chemistry that is out-of-distribution. Treat ligand pose predictions as docking hypotheses, not as bound structures.
Disorder mis-read. Low-pLDDT regions are not “wrong”: they are predictions of disorder. Removing them and re-rendering the structure is a self-deception.
Single state, multi-state biology. GPCRs, kinases, ion channels, allosteric enzymes, and any protein with a meaningful conformational cycle are not described by a single structure, predicted or experimental.

Downstream Applications and Their Limits

Predicted structures feed into:

Variant effect prediction. AlphaMissense (Cheng et al., 2023) used AlphaFold-derived features to classify approximately 89% of human missense variants of unknown significance as likely benign or likely pathogenic. The classifier is a research tool; clinical use requires the standard regulatory pathway and independent validation. See Variant Effect Prediction.
Protein design. RFdiffusion (Watson et al., 2023) and related systems generate sequences that fold to specified geometries. AlphaFold-predicted structures of the designs are an iteration-loop component, not a substitute for experimental characterization. See Protein Design and Engineering.
Structure-based drug design. Predicted structures support docking and pocket analysis but inherit all the conformational and ligand-chemistry limitations above. See Small Molecule Generation and ADMET.
Functional annotation. Predicted folds can suggest functional class by structural homology: useful for orphan proteins, but not a substitute for biochemical or genetic validation.

Frontier Directions

Two directions are worth tracking:

Generative protein modeling at scale. ESM-3 (Hayes et al., 2024, preprint), from EvolutionaryScale, is a multi-modal generative protein language model reasoning over sequence, structure, and function simultaneously. The relevance to prediction (as opposed to design) is the convergence: structure-prediction models and design models share a representational substrate. A practical implication is that future “prediction” workflows will increasingly produce ensembles and conditional generations, not single structures.
Affinity-aware interaction models. Boltz-2’s affinity head (Passaro et al., 2025, preprint) is an early example of folding-class models reaching into binding-affinity territory. Treat early claims with the same skepticism applied to any structure-then-ranking pipeline: docking-score correlation with experimental affinity is famously weak across the docking literature.

Practice Notes

Read pLDDT and PAE before you read the cartoon. Pretty structures persuade; metrics protect you.
For consequential decisions, compute multiple predictions (different seeds, MSA depths, or systems) and look at consensus. A prediction that flips under reasonable perturbation is a low-confidence prediction regardless of pLDDT.
For drug-discovery work, do not commit chemistry to a predicted pocket without (a) an experimental structure of a homolog or (b) explicit biochemical evidence that the pocket exists. Cryptic pockets are systematically under-represented in predicted structures.
For antibody-antigen work, treat any single prediction as a hypothesis-grade interface until validated by mutagenesis, alanine scanning, or epitope mapping.
For conformational-cycle proteins (GPCRs, kinases, transporters), predict multiple states (different templates, conformation-specific MSAs) and validate against any available state-specific experimental structures before functional interpretation.
License diligence is part of method selection. Server-only models are not a deployable infrastructure; permissive open-source releases (Boltz, Chai, OpenFold) often are.
Cite the underlying paper and the model version. “AlphaFold” without a version is ambiguous; 2, 2.3, 3, and Multimer are different systems with different scope and different licenses.

Common Questions

Did AlphaFold “solve protein folding”?

For the operational definition Moult and CASP used: predict static three-dimensional structure from sequence for a typical well-folded protein domain at experimental accuracy: yes, for the majority of cases at CASP14 (Jumper et al., 2021). For the biological problem people sometimes mean: predict function, mechanism, conformational dynamics, and binding from sequence: no. The capability change is substantial; the framing matters.

Should I still pursue experimental structure determination?

Yes, for state-specific structures, complexes that AlphaFold’s confidence flags as uncertain, ligand-bound geometries that matter for chemistry, and any structure that will appear in a regulatory submission. Predicted structures are excellent first drafts and excellent screening tools. They are not substitutes for cryo-EM, X-ray, or NMR data that will support a publication or a development candidate.

Is AlphaFold 3 actually better than AlphaFold 2 for proteins-only tasks?

For single-chain protein-only predictions, AlphaFold 2 remains the well-characterized baseline. AlphaFold 3’s value is in the expanded chemical scope (NAs, ligands, ions, modifications). For a single-chain monomer where you do not need that scope, AlphaFold 2 is the simpler, better-understood, more reproducible choice.

How do I cite AlphaFold predictions in a paper?

Cite the underlying paper for the system used (Jumper et al., 2021; Abramson et al., 2024), the database paper if using AFDB models (Varadi et al., 2023; Varadi et al., 2021), and report version and confidence metrics (pLDDT distribution, PAE for relevant residue pairs) in the methods section. Treat predicted structures the way you would treat any computational result that informed an experimental design.

Can I use AlphaFold 3 commercially?

Read the current license before assuming. The terms have been a moving target since the May 2024 release. If commercial use is essential, the open-source AF3-class alternatives (Boltz-1, Chai-1) have permissive licenses that are easier to scope around.

Is structure prediction the same as protein design?

No. Prediction takes sequence and asks “what does this fold to?” Design takes a target geometry or function and asks “what sequence achieves this?” The systems share representational machinery and design pipelines often use prediction as a quality-control filter, but the validation evidence required is different. See Protein Design and Engineering.

Cross-References

AI for the Life Sciences: Evidence framework and the Demonstrated / Theoretical / Beyond tiers
Foundation Models for Biology: ESMFold and the protein language model lineage
Protein Design and Engineering: RFdiffusion, generation pipelines, and design-validate-iterate loops
Antibody and Biologic Design: Antibody-specific limitations of generic structure models
Variant Effect Prediction: AlphaMissense and structure-informed variant classification
Small Molecule Generation and ADMET: Structure-based design downstream of structure prediction
Evaluation Principles for Biomedical Discovery AI: How to design a validation plan that turns a prediction into a decision
Benchmarks for Bio AI: CASP, CAMEO, and the limits of static benchmarks