Pathogenicity Prediction of Single Amino Acid Variants With Machine Learning Model Based on Protein Structural Energies

被引:1
|
作者
Wu, Tzu-Hsuan [1 ]
Lin, Peng-Chan [2 ]
Chou, Hsin-Hung [3 ]
Shen, Meng-Ru [4 ]
Hsieh, Sun-Yuan [1 ,5 ]
机构
[1] Natl Cheng Kung Univ, Inst Med Informat, Tainan 701, Taiwan
[2] Natl Cheng Kung Univ Hosp, Dept Comp Sci & Informat Engn, Dept Internal Med, Tainan 704, Taiwan
[3] Natl Chi Nan Univ, Dept Comp Sci & Informat Engn, Puli Township 54516, Nantou County, Taiwan
[4] Natl Cheng Kung Univ, Dept Obstet & Gynecol, Dept Pharmacol, Coll Med, Tainan 701, Taiwan
[5] Natl Cheng Kung Univ, Inst Mfg Informat Syst, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
关键词
Machine learning; pathogenicity prediction; protein structure energy; single amino acid variants; SNP; MUTATIONS; POLYMORPHISMS;
D O I
10.1109/TCBB.2021.3139048
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The most popular tools for predicting pathogenicity of single amino acid variants (SAVs) were developed based on sequence-based techniques. SAVs may change protein structure and function. In the context of van derWaals force and disulfide bridge calculations, no method directly predicts the impact of mutations on the energies of the protein structure. Here, we combined machine learning methods and energy scores of protein structures calculated by Rosetta Energy Function 2015 to predict SAV pathogenicity. The accuracy level of our model (0.76) is higher than that of six prediction tools. Further analyses revealed that the differential reference energies, attractive energies, and solvation of polar atoms between wildtype and mutant side-chains played essential roles in distinguishing benign from pathogenic variants. These features indicated the physicochemical properties of amino acids, which were observed in 3D structures instead of sequences. We added 16 features to Rhapsody (the prediction tool we used for our data set) and consequently improved its performance. The results indicated that these energy scores were more appropriate and more detailed representations of the pathogenicity of SAVs.
引用
收藏
页码:606 / 615
页数:10
相关论文
共 50 条
  • [41] Machine learning based prediction model for single event burnout hardening design of power MOSFETs
    Liao, Xinfang
    Xu, Changqing
    Liu, Yi
    Wang, Chen
    Chen, Dongdong
    Yang, Yintang
    MICROELECTRONICS JOURNAL, 2023, 139
  • [42] Prediction of variants of DDoS attacks based on statistical analysis and machine learning algorithms
    Mishra, Anupama
    Gupta, Neena
    Gupta, Brij B.
    Bhatia, Karamjit
    Aswal, Mahendra Singh
    International Journal of Innovative Computing and Applications, 2024, 15 (01) : 14 - 25
  • [43] A machine learning-based treatment prediction model using whole genome variants of hepatitis C virus
    Haga, Hiroaki
    Sato, Hidenori
    Koseki, Ayumi
    Saito, Takafumi
    Okumoto, Kazuo
    Hoshikawa, Kyoko
    Katsumi, Tomohiro
    Mizuno, Kei
    Nishina, Taketo
    Ueno, Yoshiyuki
    PLOS ONE, 2020, 15 (11):
  • [44] Applying Physics-Based Scoring to Calculate Free Energies of Binding for Single Amino Acid Mutations in Protein-Protein Complexes
    Beard, Hege
    Cholleti, Anuradha
    Pearlman, David
    Sherman, Woody
    Loving, Kathryn A.
    PLOS ONE, 2013, 8 (12):
  • [45] Protein pKa Prediction by Tree-Based Machine Learning
    Chen, Ada Y.
    Lee, Juyong
    Damjanovic, Ana
    Brooks, Bernard R.
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2022, 18 (04) : 2673 - 2686
  • [46] Advances in Protein Contact Map Prediction Based on Machine Learning
    Xie, Jiang
    Ding, Wang
    Chen, Luonan
    Guo, Qiang
    Zhang, Wu
    MEDICINAL CHEMISTRY, 2015, 11 (03) : 265 - 270
  • [47] Machine learning techniques for pathogenicity prediction of non-synonymous single nucleotide polymorphisms in human body
    El Houby, Enas M. F.
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2022, 14 (7) : 8099 - 8113
  • [48] Prediction of recombinant protein overexpression in Escherichia coli using a machine learning based model (RPOLP)
    Habibi, Narjeskhatoon
    Norouzi, Alireza
    Hashim, Siti Z. Mohd
    Shamsir, Mohd Shahir
    Samian, Razip
    COMPUTERS IN BIOLOGY AND MEDICINE, 2015, 66 : 330 - 336
  • [49] Machine learning techniques for pathogenicity prediction of non-synonymous single nucleotide polymorphisms in human body
    Enas M. F. El Houby
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 8099 - 8113
  • [50] An effective machine learning-based model for the prediction of protein-protein interaction sites in health systems
    Tahir, Muhammad
    Khan, Fazlullah
    Hayat, Maqsood
    Alshehri, Mohammad Dahman
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (01): : 65 - 75