Nucleotide augmentation for machine learning-guided protein engineering

被引:3
|
作者
Minot, Mason [1 ]
Reddy, Sai T. [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, CH-4058 Basel, Switzerland
来源
BIOINFORMATICS ADVANCES | 2023年 / 3卷 / 01期
关键词
REGULARIZATION; SELECTION;
D O I
10.1093/bioadv/vbac094
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning-guided protein engineering is a rapidly advancing field. Despite major experimental and computational advances, collecting protein genotype (sequence) and phenotype (function) data remains time- and resource-intensive. As a result, the quality and quantity of training data are often a limiting factor in developing machine learning models. Data augmentation techniques have been successfully applied to the fields of computer vision and natural language processing; however, there is a lack of such augmentation techniques for biological sequence data. Towards this end, we develop nucleotide augmentation (NTA), which leverages natural nucleotide codon degeneracy to augment protein sequence data via synonymous codon substitution. As a proof of concept for protein engineering, we test several online and offline augmentation implementations to train machine learning models with benchmark datasets of protein genotype and phenotype, revealing performance gains on par and surpassing benchmark models using a fraction of the training data. NTA also enables substantial improvements for classification tasks under heavy class imbalance.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Machine Learning-Guided Protein Engineering
    Kouba, Petr
    Kohout, Pavel
    Haddadi, Faraneh
    Bushuiev, Anton
    Samusevich, Raman
    Sedlar, Jiri
    Damborsky, Jiri
    Pluskal, Tomas
    Sivic, Josef
    Mazurenko, Stanislav
    ACS CATALYSIS, 2023, 13 (21) : 13863 - 13895
  • [2] Challenges and opportunities in machine learning-guided plant protein engineering
    Shukla, Diwakar
    BIOPHYSICAL JOURNAL, 2024, 123 (03) : 455A - 455A
  • [3] Machine learning-guided channelrhodopsin engineering enables minimally invasive optogenetics
    Claire N. Bedbrook
    Kevin K. Yang
    J. Elliott Robinson
    Elisha D. Mackey
    Viviana Gradinaru
    Frances H. Arnold
    Nature Methods, 2019, 16 : 1176 - 1184
  • [4] Machine learning-guided channelrhodopsin engineering enables minimally invasive optogenetics
    Bedbrook, Claire N.
    Yang, Kevin K.
    Robinson, J. Elliott
    Mackey, Elisha D.
    Gradinaru, Viviana
    Arnold, Frances H.
    NATURE METHODS, 2019, 16 (11) : 1176 - +
  • [5] Machine learning-guided engineering of genetically encoded fluorescent calcium indicators
    Wait, Sarah J.
    Exposit, Marc
    Lin, Sophia
    Rappleye, Michael
    Lee, Justin Daho
    Colby, Samuel A.
    Torp, Lily
    Asencio, Anthony
    Smith, Annette
    Regnier, Michael
    Moussavi-Harami, Farid
    Baker, David
    Kim, Christina K.
    Berndt, Andre
    NATURE COMPUTATIONAL SCIENCE, 2024, 4 (03): : 224 - 236
  • [6] Machine Learning-Guided Prediction of Hydroformylation
    Shi, Haonan
    Shen, Chaoren
    Huang, Zheng
    Dong, Kaiwu
    CHEMPHYSCHEM, 2025, 26 (03)
  • [7] Machine learning-guided prediction of potential engineering targets for microbial production of lycopene
    Kang, Chang Keun
    Shin, Jihoon
    Cha, YoonKyung
    Kim, Min Sun
    Choi, Min Sun
    Kim, TaeHo
    Park, Young -Kwon
    Choi, Yong Jun
    BIORESOURCE TECHNOLOGY, 2023, 369
  • [8] Machine Learning-Guided Three-Dimensional Printing of Tissue Engineering Scaffolds
    Conev, Anja
    Litsa, Eleni E.
    Perez, Marissa R.
    Diba, Mani
    Mikos, Antonios G.
    Kavraki, Lydia E.
    TISSUE ENGINEERING PART A, 2020, 26 (23-24) : 1359 - 1368
  • [9] Machine Learning-Guided Etch Proximity Correction
    Shim, Seongbo
    Shin, Youngsoo
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2017, 30 (01) : 1 - 7
  • [10] Benchmarking protein structure predictors to assist machine learning-guided peptide discovery
    Aldas-Bulos, Victor Daniel
    Plisson, Fabien
    DIGITAL DISCOVERY, 2023, 2 (04): : 981 - 993