Nucleotide augmentation for machine learning-guided protein engineering

被引:3
|
作者
Minot, Mason [1 ]
Reddy, Sai T. [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, CH-4058 Basel, Switzerland
来源
BIOINFORMATICS ADVANCES | 2023年 / 3卷 / 01期
关键词
REGULARIZATION; SELECTION;
D O I
10.1093/bioadv/vbac094
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning-guided protein engineering is a rapidly advancing field. Despite major experimental and computational advances, collecting protein genotype (sequence) and phenotype (function) data remains time- and resource-intensive. As a result, the quality and quantity of training data are often a limiting factor in developing machine learning models. Data augmentation techniques have been successfully applied to the fields of computer vision and natural language processing; however, there is a lack of such augmentation techniques for biological sequence data. Towards this end, we develop nucleotide augmentation (NTA), which leverages natural nucleotide codon degeneracy to augment protein sequence data via synonymous codon substitution. As a proof of concept for protein engineering, we test several online and offline augmentation implementations to train machine learning models with benchmark datasets of protein genotype and phenotype, revealing performance gains on par and surpassing benchmark models using a fraction of the training data. NTA also enables substantial improvements for classification tasks under heavy class imbalance.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Deep learning-guided video compression for machine vision tasks
    Kim, Aro
    Woo, Seung-taek
    Park, Minho
    Kim, Dong-hwi
    Lim, Hanshin
    Jung, Soon-heung
    Kwak, Sangwoon
    Park, Sang-hyo
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2024, 2024 (01)
  • [22] Machine learning-guided strategies for reaction conditions design and optimization
    Chen, Lung-Yi
    Li, Yi-Pei
    BEILSTEIN JOURNAL OF ORGANIC CHEMISTRY, 2024, 20 : 2476 - 2492
  • [23] Development of a Novel Sparse Labeling Method by Machine Learning-Guided Engineering of Cre-lox Recombination
    Yamauchi, Yuji
    Matsukura, Hidenori
    Ueda, Mitsuyoshi
    Aoki, Wataru
    FASEB JOURNAL, 2021, 35
  • [24] Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production
    Greenhalgh, Jonathan C.
    Fahlberg, Sarah A.
    Pfleger, Brian F.
    Romero, Philip A.
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [25] Metabolic reprogramming and machine learning-guided cofactor engineering to boost nicotinamide mononucleotide production in Escherichia coli
    Xiong, Bo
    Yang, Tianrui
    Zhang, Zixiong
    Li, Xiang
    Yu, Huan
    Wang, Lian
    You, Zixuan
    Peng, Wenbin
    Jin, Luyu
    Song, Hao
    BIORESOURCE TECHNOLOGY, 2025, 426
  • [26] Machine Learning-Guided Adjuvant Treatment of Head and Neck Cancer
    Howard, Frederick Matthew
    Kochanny, Sara
    Koshy, Matthew
    Spiotto, Michael
    Pearson, Alexander T.
    JAMA NETWORK OPEN, 2020, 3 (11)
  • [27] Overcoming the challenges in machine learning-guided antimicrobial peptide design
    Plisson, Fabien
    JOURNAL OF PEPTIDE SCIENCE, 2022, 28
  • [28] Machine learning-guided synthesis of nanomaterials for breast cancer therapy
    Zhou, Kun
    Tian, Baoxing
    Lu, Ji
    Dong, Bing
    Xu, Han
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [29] Machine-learning-guided directed evolution for protein engineering
    Yang, Kevin K.
    Wu, Zachary
    Arnold, Frances H.
    NATURE METHODS, 2019, 16 (08) : 687 - 694
  • [30] Machine-learning-guided directed evolution for protein engineering
    Kevin K. Yang
    Zachary Wu
    Frances H. Arnold
    Nature Methods, 2019, 16 : 687 - 694