Informed training set design enables efficient machine learning-assisted directed protein evolution

被引:80
|
作者
Wittmann, Bruce J. [1 ]
Yue, Yisong [2 ]
Arnold, Frances H. [1 ,3 ]
机构
[1] CALTECH, Div Biol & Biol Engn, MC 210-41,1200 E Calif Blvd, Pasadena, CA 91125 USA
[2] CALTECH, Dept Comp & Math Sci, MC 305-16,1200 E Calif Blvd, Pasadena, CA 91125 USA
[3] CALTECH, Div Chem & Chem Engn, MC 210-41,1200 E Calif Blvd, Pasadena, CA 91125 USA
关键词
FITNESS LANDSCAPE; EPISTASIS; DATABASE;
D O I
10.1016/j.cels.2021.07.008
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Directed evolution of proteins often involves a greedy optimization in which the mutation in the highest fitness variant identified in each round of single-site mutagenesis is fixed. The efficiency of such a singlestep greedy walk depends on the order in which beneficial mutations are identified-the process is path dependent. Here, we investigate and optimize a path-independent machine learning-assisted directed evolution (MLDE) protocol that allows in silico screening of full combinatorial libraries. In particular, we evaluate the importance of different protein encoding strategies, training procedures, models, and training set design strategies on MLDE outcome, finding the most important consideration to be the implementation of strategies that reduce inclusion of minimally informative "holes"(protein variants with zero or extremely low fitness) in training data. When applied to an epistatic, hole-filled, four-site combinatorial fitness landscape, our optimized protocol achieved the global fitness maximum up to 81-fold more frequently than singlestep greedy optimization. A record of this paper's transparent peer review process is included in the supplemental information.
引用
收藏
页码:1026 / +
页数:28
相关论文
共 50 条
  • [1] Machine learning-assisted directed protein evolution with combinatorial libraries
    Wu, Zachary
    Kan, S. B. Jennifer
    Lewis, Russell D.
    Wittmann, Bruce J.
    Arnold, Frances H.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (18) : 8852 - 8858
  • [2] Cluster learning-assisted directed evolution
    Qiu, Yuchi
    Hu, Jian
    Wei, Guo-Wei
    [J]. NATURE COMPUTATIONAL SCIENCE, 2021, 1 (12): : 809 - 818
  • [3] Cluster learning-assisted directed evolution
    Yuchi Qiu
    Jian Hu
    Guo-Wei Wei
    [J]. Nature Computational Science, 2021, 1 : 809 - 818
  • [4] Machine learning-assisted chemical design of highly efficient deicers
    Ito, Kai
    Fukatsu, Arisa
    Okada, Kenji
    Takahashi, Masahide
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [5] Machine learning-assisted directed protein evolution with combinatorial libraries (vol 116, pg 8852, 2019)
    Wu, Zachary
    Kan, S. B. Jennifer
    Lewis, Russell D.
    Wittmann, Bruce J.
    Arnold, Frances H.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (01) : 788 - 789
  • [6] Machine learning-assisted solvent molecule design for efficient absorption of ethanethiol
    Chen, Yuxiang
    Liu, Chuanlei
    Gong, Zijun
    Zhao, Qiyue
    Guo, Guanchu
    Jiang, Hao
    Sun, Hui
    Shen, Benxian
    [J]. Huagong Xuebao/CIESC Journal, 2024, 75 (03): : 914 - 923
  • [7] Methanol tolerance upgrading of Proteus mirabilis lipase by machine learning-assisted directed evolution
    Ma, Rui
    Li, Yingnan
    Zhang, Meng
    Xu, Fei
    [J]. SYSTEMS MICROBIOLOGY AND BIOMANUFACTURING, 2023, 3 (03): : 427 - 439
  • [8] Machine Learning-Assisted Design of Material Properties
    Kadulkar, Sanket
    Sherman, Zachary M.
    Ganesan, Venkat
    Truskett, Thomas M.
    [J]. ANNUAL REVIEW OF CHEMICAL AND BIOMOLECULAR ENGINEERING, 2022, 13 : 235 - 254
  • [9] Machine Learning-Assisted Modeling in Antenna Array Design
    Wu, Qi
    Chen, Weiqi
    Li, Yuefeng
    Wang, Haiming
    Yin, Jiexi
    Yin, Weishuang
    [J]. 2024 IEEE INTERNATIONAL WORKSHOP ON ANTENNA TECHNOLOGY, IWAT, 2024, : 92 - 93
  • [10] Machine Learning-Assisted Low-Dimensional Electrocatalysts Design for Hydrogen Evolution Reaction
    Jin Li
    Naiteng Wu
    Jian Zhang
    Hong-Hui Wu
    Kunming Pan
    Yingxue Wang
    Guilong Liu
    Xianming Liu
    Zhenpeng Yao
    Qiaobao Zhang
    [J]. Nano-Micro Letters, 2023, 15 (12) : 169 - 195