Nucleotide augmentation for machine learning-guided protein engineering

被引:3
|
作者
Minot, Mason [1 ]
Reddy, Sai T. [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, CH-4058 Basel, Switzerland
来源
BIOINFORMATICS ADVANCES | 2023年 / 3卷 / 01期
关键词
REGULARIZATION; SELECTION;
D O I
10.1093/bioadv/vbac094
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning-guided protein engineering is a rapidly advancing field. Despite major experimental and computational advances, collecting protein genotype (sequence) and phenotype (function) data remains time- and resource-intensive. As a result, the quality and quantity of training data are often a limiting factor in developing machine learning models. Data augmentation techniques have been successfully applied to the fields of computer vision and natural language processing; however, there is a lack of such augmentation techniques for biological sequence data. Towards this end, we develop nucleotide augmentation (NTA), which leverages natural nucleotide codon degeneracy to augment protein sequence data via synonymous codon substitution. As a proof of concept for protein engineering, we test several online and offline augmentation implementations to train machine learning models with benchmark datasets of protein genotype and phenotype, revealing performance gains on par and surpassing benchmark models using a fraction of the training data. NTA also enables substantial improvements for classification tasks under heavy class imbalance.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Fit4Function: A Machine Learning-Guided Approach for Systematic Multi-Trait AAV Capsid Engineering
    Eid, Fatma-Elzahraa
    Chan, Ken Y.
    Chen, Albert T.
    Huang, Qin
    Tobey, Isabelle G.
    Zheng, Qingxia
    Pacouret, Simon
    Lage, Kasper
    Chan, Yujia Alina
    Deverman, Benjamin E.
    MOLECULAR THERAPY, 2022, 30 (04) : 558 - 559
  • [42] Enhancing Abstract Argumentation Solvers with Machine Learning-Guided Heuristics: A Feasibility Study
    Hoffmann, Sandra
    Kuhlmann, Isabelle
    Thimm, Matthias
    ROBUST ARGUMENTATION MACHINES, RATIO 2024, 2024, 14638 : 185 - 201
  • [43] AMPGAN v2: Machine Learning-Guided Design of Antimicrobial Peptides
    Van Oort, Colin M.
    Ferrell, Jonathon B.
    Remington, Jacob M.
    Wshah, Safwan
    Li, Jianing
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (05) : 2198 - 2207
  • [44] Machine Learning-Guided Performance Evaluation of an All-Liquid Electrochromic Device
    Lai, Huayi
    Cai, Qingyue
    Li, Muyun
    Kong, Sifan
    Wu, Yitong
    Yang, Huan
    Zhang, Yong
    Ning, Honglong
    ACS APPLIED MATERIALS & INTERFACES, 2024, 16 (22) : 28798 - 28807
  • [45] Machine Learning-Guided Systematic Search of DNA Sequences for Sorting Carbon Nanotubes
    Lin, Zhiwei
    Yang, Yoona
    Jagota, Anand
    Zheng, Ming
    ACS NANO, 2022, 16 (03) : 4705 - 4713
  • [46] Machine Learning-Guided Design of Pearlitic Steel with Promising Mechanical and Tribological Properties
    Qiao, Ling
    Zhu, Jingchuan
    ADVANCED ENGINEERING MATERIALS, 2021, 23 (12)
  • [47] Evaluation of a Machine Learning-Guided Strategy for Elevated Lipoprotein(a) Screening in Health Systems
    Aminorroaya, Arya
    Dhingra, Lovedeep S.
    Oikonomou, Evangelos K.
    Khera, Rohan
    CIRCULATION-GENOMIC AND PRECISION MEDICINE, 2025, 18 (01):
  • [48] Machine learning-guided discovery of ionic polymer electrolytes for lithium metal batteries
    Kai Li
    Jifeng Wang
    Yuanyuan Song
    Ying Wang
    Nature Communications, 14
  • [49] Machine Learning-Guided Optimization of Lipid Nanoparticle Composition for B Cell Transfection
    Toh, Wu Han
    Cheng, Leonardo
    Weng, Gene
    Shin, Charles
    Aggarwal, Ataes
    Mao, Hai-Quan
    MOLECULAR THERAPY, 2024, 32 (04) : 72 - 73
  • [50] Machine Learning-Guided Exploration of Glass-Forming Ability in Multicomponent Alloys
    Yao, Yi
    Sullivan, Timothy
    Yan, Feng
    Gong, Jiaqi
    Li, Lin
    JOM, 2022, 74 (12) : 4853 - 4863