Protein Design and Engineering

Author
Published

May 24, 2026

Protein design turns the structure problem around. Instead of asking what shape a sequence adopts, the design problem asks which sequence, scaffold, or assembly satisfies a target constraint.

Learning Objectives
  • Distinguish backbone generation, inverse folding, and function design.
  • Use experimental validation as the boundary between design and speculation.
  • Recognize why design tasks differ from natural structure prediction.
TL;DR

Protein design is strongest when the target is structurally specified and the success assay is direct. Claims become weaker as design moves from fold, to binding, to catalysis, to cellular phenotype.

Introduction

RFdiffusion and ProteinMPNN are central tools in the current design stack. RFdiffusion generates protein backbones under structural constraints (Watson et al., 2023). ProteinMPNN designs amino acid sequences for target backbones (Dauparas et al., 2022). ESM3 added a multimodal protein language model route to sequence, structure, and function-conditioned generation (Hayes et al., 2025).

Demonstrated

Demonstrated capability includes de novo protein backbone design, sequence design for specified structures, and experimental success in selected binder and fold design tasks. RFdiffusion demonstrated experimentally tested designs across several categories (Watson et al., 2023). ProteinMPNN demonstrated strong sequence recovery and design utility for fixed backbones (Dauparas et al., 2022). ESM3 demonstrated generation and protein synthesis of esmGFP in the Science report (Hayes et al., 2025).

Evidence Anchor What It Supports Practical Constraint
RFdiffusion Backbone generation for design tasks Function remains assay-dependent
ProteinMPNN Sequence design for specified structures Designed sequence quality depends on backbone realism
ESM3 Multimodal protein generation with experimental protein synthesis example General design reliability varies by target function

Theoretical

Theoretical capability includes routine design of enzymes, switches, and therapeutic proteins from natural-language specifications. Existing workflows still require explicit constraints, expert review, and wet-lab iteration.

Beyond Current Capabilities

Beyond current capabilities includes general function design where a text prompt produces a safe, manufacturable, active biologic without assay cycles. Protein function is context-dependent and rarely reducible to static structure.

Practice Notes

  • Define success as an assay result, not a confidence score.
  • Check novelty, developability, immunogenicity signals, aggregation risk, and manufacturability.
  • Use negative design constraints when off-target binding or aggregation matters.
  • Document all sequence filters before synthesis orders or expression work.