Protein Design and Engineering
Protein design turns the structure problem around. Instead of asking what shape a sequence adopts, the design problem asks which sequence, scaffold, or assembly satisfies a target constraint.
- Distinguish backbone generation, inverse folding, and function design.
- Use experimental validation as the boundary between design and speculation.
- Recognize why design tasks differ from natural structure prediction.
Protein design is strongest when the target is structurally specified and the success assay is direct. Claims become weaker as design moves from fold, to binding, to catalysis, to cellular phenotype.
Introduction
RFdiffusion and ProteinMPNN are central tools in the current design stack. RFdiffusion generates protein backbones under structural constraints (Watson et al., 2023). ProteinMPNN designs amino acid sequences for target backbones (Dauparas et al., 2022). ESM3 added a multimodal protein language model route to sequence, structure, and function-conditioned generation (Hayes et al., 2025).
Demonstrated
Demonstrated capability includes de novo protein backbone design, sequence design for specified structures, and experimental success in selected binder and fold design tasks. RFdiffusion demonstrated experimentally tested designs across several categories (Watson et al., 2023). ProteinMPNN demonstrated strong sequence recovery and design utility for fixed backbones (Dauparas et al., 2022). ESM3 demonstrated generation and protein synthesis of esmGFP in the Science report (Hayes et al., 2025).
| Evidence Anchor | What It Supports | Practical Constraint |
|---|---|---|
| RFdiffusion | Backbone generation for design tasks | Function remains assay-dependent |
| ProteinMPNN | Sequence design for specified structures | Designed sequence quality depends on backbone realism |
| ESM3 | Multimodal protein generation with experimental protein synthesis example | General design reliability varies by target function |
Theoretical
Theoretical capability includes routine design of enzymes, switches, and therapeutic proteins from natural-language specifications. Existing workflows still require explicit constraints, expert review, and wet-lab iteration.
Beyond Current Capabilities
Beyond current capabilities includes general function design where a text prompt produces a safe, manufacturable, active biologic without assay cycles. Protein function is context-dependent and rarely reducible to static structure.
Practice Notes
- Define success as an assay result, not a confidence score.
- Check novelty, developability, immunogenicity signals, aggregation risk, and manufacturability.
- Use negative design constraints when off-target binding or aggregation matters.
- Document all sequence filters before synthesis orders or expression work.