Nucleotide augmentation for machine learning-guided protein engineering

被引:3
|
作者
Minot, Mason [1 ]
Reddy, Sai T. [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, CH-4058 Basel, Switzerland
来源
BIOINFORMATICS ADVANCES | 2023年 / 3卷 / 01期
关键词
REGULARIZATION; SELECTION;
D O I
10.1093/bioadv/vbac094
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning-guided protein engineering is a rapidly advancing field. Despite major experimental and computational advances, collecting protein genotype (sequence) and phenotype (function) data remains time- and resource-intensive. As a result, the quality and quantity of training data are often a limiting factor in developing machine learning models. Data augmentation techniques have been successfully applied to the fields of computer vision and natural language processing; however, there is a lack of such augmentation techniques for biological sequence data. Towards this end, we develop nucleotide augmentation (NTA), which leverages natural nucleotide codon degeneracy to augment protein sequence data via synonymous codon substitution. As a proof of concept for protein engineering, we test several online and offline augmentation implementations to train machine learning models with benchmark datasets of protein genotype and phenotype, revealing performance gains on par and surpassing benchmark models using a fraction of the training data. NTA also enables substantial improvements for classification tasks under heavy class imbalance.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering
    Ding, Kerr
    Chin, Michael
    Zhao, Yunlong
    Huang, Wei
    Mai, Binh Khanh
    Wang, Huanan
    Liu, Peng
    Yang, Yang
    Luo, Yunan
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [32] Machine learning-guided malate dehydrogenase engineering for improved production of L-malic acid in Aspergillus niger
    Zhang, Zihan
    Zheng, Yuanyuan
    Zhang, Chi
    Xu, Qing
    Xue, Feng
    MOLECULAR CATALYSIS, 2025, 578
  • [33] Machine learning-guided discovery and design of non-hemolytic peptides
    Plisson, Fabien
    Ramirez-Sanchez, Obed
    Martinez-Hernandez, Cristina
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [34] Machine learning-guided discovery of gas evolving electrode bubble inactivation
    Lake, Jack R.
    Rufer, Simon
    James, Jim
    Pruyne, Nathan
    Scourtas, Aristana
    Schwarting, Marcus
    Ambadkar, Aadit
    Foster, Ian
    Blaiszik, Ben
    Varanasi, Kripa K.
    NANOSCALE, 2025, 17 (03)
  • [35] Machine learning-guided design, synthesis, and characterization of atomically dispersed electrocatalysts
    Li, Sirui
    Zhang, Hanguang
    Holby, Edward F.
    Zelenay, Piotr
    Kort-Kamp, Wilton J. M.
    CURRENT OPINION IN ELECTROCHEMISTRY, 2024, 48
  • [36] Machine learning-guided prediction and optimization of precipitation efficiency in the Bayer process
    Bakhtom, Abbas
    Bariki, Saeed Ghasemzade
    Movahedirad, Salman
    Sobati, Mohammad Amin
    CHEMICAL PAPERS, 2023, 77 (05) : 2509 - 2524
  • [37] LExecutor: Learning-Guided Execution
    Souza, Beatriz
    Pradel, Michael
    PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 1522 - 1534
  • [38] Machine Learning-Guided Discovery of AcrB and MexB Efflux Pump Inhibitors
    Bera, Abhishek
    Roy, Rakesh Kumar
    Joshi, Pritish
    Patra, Niladri
    JOURNAL OF PHYSICAL CHEMISTRY B, 2024, 128 (03): : 648 - 663
  • [39] Machine learning-guided discovery and design of non-hemolytic peptides
    Fabien Plisson
    Obed Ramírez-Sánchez
    Cristina Martínez-Hernández
    Scientific Reports, 10
  • [40] Machine learning-guided prediction and optimization of precipitation efficiency in the Bayer process
    Abbas Bakhtom
    Saeed Ghasemzade Bariki
    Salman Movahedirad
    Mohammad Amin Sobati
    Chemical Papers, 2023, 77 : 2509 - 2524