Informed training set design enables efficient machine learning-assisted directed protein evolution

被引:80
|
作者
Wittmann, Bruce J. [1 ]
Yue, Yisong [2 ]
Arnold, Frances H. [1 ,3 ]
机构
[1] CALTECH, Div Biol & Biol Engn, MC 210-41,1200 E Calif Blvd, Pasadena, CA 91125 USA
[2] CALTECH, Dept Comp & Math Sci, MC 305-16,1200 E Calif Blvd, Pasadena, CA 91125 USA
[3] CALTECH, Div Chem & Chem Engn, MC 210-41,1200 E Calif Blvd, Pasadena, CA 91125 USA
关键词
FITNESS LANDSCAPE; EPISTASIS; DATABASE;
D O I
10.1016/j.cels.2021.07.008
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Directed evolution of proteins often involves a greedy optimization in which the mutation in the highest fitness variant identified in each round of single-site mutagenesis is fixed. The efficiency of such a singlestep greedy walk depends on the order in which beneficial mutations are identified-the process is path dependent. Here, we investigate and optimize a path-independent machine learning-assisted directed evolution (MLDE) protocol that allows in silico screening of full combinatorial libraries. In particular, we evaluate the importance of different protein encoding strategies, training procedures, models, and training set design strategies on MLDE outcome, finding the most important consideration to be the implementation of strategies that reduce inclusion of minimally informative "holes"(protein variants with zero or extremely low fitness) in training data. When applied to an epistatic, hole-filled, four-site combinatorial fitness landscape, our optimized protocol achieved the global fitness maximum up to 81-fold more frequently than singlestep greedy optimization. A record of this paper's transparent peer review process is included in the supplemental information.
引用
收藏
页码:1026 / +
页数:28
相关论文
共 50 条
  • [41] Design and validation of a microalgae biorefinery using machine learning-assisted modeling of hydrothermal liquefaction
    Wu, Wei
    Huang, Cheng -Ming
    Tsai, Yu-Hsun
    [J]. ALGAL RESEARCH-BIOMASS BIOFUELS AND BIOPRODUCTS, 2023, 74
  • [42] Machine learning-assisted design of porous carbons for removing paracetamol from aqueous solutions
    Kowalczyk, Piotr
    Terzyk, Artur P.
    Erwardt, Paulina
    Hough, Michael
    Deditius, Artur P.
    Gauden, Piotr A.
    Neimark, Alexander V.
    Kaneko, Katsumi
    [J]. Carbon, 2022, 198 : 371 - 381
  • [43] Graphene-based phononic crystal lenses: Machine learning-assisted analysis and design
    Guo, Liangteng
    Zhao, Shaoyu
    Yang, Jie
    Kitipornchai, Sritawat
    [J]. ULTRASONICS, 2024, 138
  • [44] Machine learning-assisted design of porous carbons for removing paracetamol from aqueous solutions
    Kowalczyk, Piotr
    Terzyk, Artur P.
    Erwardt, Paulina
    Hough, Michael
    Deditius, Artur P.
    Gauden, Piotr A.
    Neimark, Alexander, V
    Kaneko, Katsumi
    [J]. CARBON, 2022, 198 : 371 - 381
  • [45] Machine Learning-Assisted Design of Nitrogen-Rich Covalent Triazine Frameworks Photocatalysts
    Wu, Mingliang
    Song, Zhilong
    Cui, Yu
    Fu, Zhanzhao
    Hong, Kunquan
    Li, Qiang
    Lyu, Zhiyang
    Liu, Wei
    Wang, Jinlan
    [J]. ADVANCED FUNCTIONAL MATERIALS, 2024,
  • [46] Machine learning-assisted chemical space generation of small molecule organic semiconductors for efficient photodetectors
    Katubi, Khadijah Mohammedsaleh
    Rouf, Alvi Muhammad
    Siddique, Bilal
    Nazar, Muhammad Faizan
    Ansari, Ghulam Jillani
    Alrowaili, Z. A.
    Al-Buriahi, M. S.
    [J]. COMPUTATIONAL MATERIALS SCIENCE, 2024, 241
  • [47] Machine learning-assisted efficient demand forecasting using endogenous and exogenous indicators for the textile industry
    Yasir, Muhammad
    Ansari, Yasmeen
    Latif, Khalid
    Maqsood, Haider
    Habib, Adnan
    Moon, Jihoon
    Rho, Seungmin
    [J]. INTERNATIONAL JOURNAL OF LOGISTICS-RESEARCH AND APPLICATIONS, 2022,
  • [48] Efficient machine learning-assisted failure analysis method for circuit-level defect prediction
    Ghosh, Joydeep
    [J]. MACHINE LEARNING WITH APPLICATIONS, 2024, 16
  • [49] MACHINE LEARNING-ASSISTED MODELING AND DESIGN OPTIMIZATION OF HYBRID SHAPE MEMORY ALLOY AXIAL ACTUATORS
    Guan, Weilin
    Hewakuruppu, Hasitha J.
    Hernandez, Edwin A. Peraza
    [J]. PROCEEDINGS OF ASME 2021 CONFERENCE ON SMART MATERIALS, ADAPTIVE STRUCTURES AND INTELLIGENT SYSTEMS (SMASIS2021), 2021,
  • [50] Machine Learning-Assisted design of boron and nitrogen doped graphene nanosheets with tailored thermomechanical properties
    Mashhadzadeh, Amin Hamed
    Dehaghani, Maryam Zarghami
    Mashhadzadeh, Amir Hamed
    Kadyr, Aidyn
    Golman, Boris
    Spitas, Christos
    Kostas, Konstantinos V.
    [J]. COMPUTATIONAL MATERIALS SCIENCE, 2024, 240