A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides

被引:0
|
作者
Phasit Charoenkwan
Warot Chotpatiwetchkul
Vannajan Sanghiran Lee
Chanin Nantasenamat
Watshara Shoombuatong
机构
[1] Chiang Mai University,Modern Management and Information Technology, College of Arts, Media and Technology
[2] King Mongkut’s Institute of Technology Ladkrabang,Applied Computational Chemistry Research Unit, Department of Chemistry, School of Science
[3] University of Malaya,Department of Chemistry, Centre of Theoretical and Computational Physics, Faculty of Science
[4] Mahidol University,Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Owing to their ability to maintain a thermodynamically stable fold at extremely high temperatures, thermophilic proteins (TTPs) play a critical role in basic research and a variety of applications in the food industry. As a result, the development of computation models for rapidly and accurately identifying novel TTPs from a large number of uncharacterized protein sequences is desirable. In spite of existing computational models that have already been developed for characterizing thermophilic proteins, their performance and interpretability remain unsatisfactory. We present a novel sequence-based thermophilic protein predictor, termed SCMTPP, for improving model predictability and interpretability. First, an up-to-date and high-quality dataset consisting of 1853 TPPs and 3233 non-TPPs was compiled from published literature. Second, the SCMTPP predictor was created by combining the scoring card method (SCM) with estimated propensity scores of g-gap dipeptides. Benchmarking experiments revealed that SCMTPP had a cross-validation accuracy of 0.883, which was comparable to that of a support vector machine-based predictor (0.906–0.910) and 2–17% higher than that of commonly used machine learning models. Furthermore, SCMTPP outperformed the state-of-the-art approach (ThermoPred) on the independent test dataset, with accuracy and MCC of 0.865 and 0.731, respectively. Finally, the SCMTPP-derived propensity scores were used to elucidate the critical physicochemical properties for protein thermostability enhancement. In terms of interpretability and generalizability, comparative results showed that SCMTPP was effective for identifying and characterizing TPPs. We had implemented the proposed predictor as a user-friendly online web server at http://pmlabstack.pythonanywhere.com/SCMTPP in order to allow easy access to the model. SCMTPP is expected to be a powerful tool for facilitating community-wide efforts to identify TPPs on a large scale and guiding experimental characterization of TPPs.
引用
收藏
相关论文
共 50 条
  • [1] A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides
    Charoenkwan, Phasit
    Chotpatiwetchkul, Warot
    Lee, Vannajan Sanghiran
    Nantasenamat, Chanin
    Shoombuatong, Watshara
    [J]. SCIENTIFIC REPORTS, 2021, 11 (01)
  • [2] SCMMTP: identifying and characterizing membrane transport proteins using propensity scores of dipeptides
    Liou, Yi-Fan
    Vasylenko, Tamara
    Yeh, Chia-Lun
    Lin, Wei-Chun
    Chiu, Shih-Hsiang
    Charoenkwan, Phasit
    Shu, Li-Sun
    Ho, Shinn-Ying
    Huang, Hui-Ling
    [J]. BMC GENOMICS, 2015, 16
  • [3] SCMMTP: identifying and characterizing membrane transport proteins using propensity scores of dipeptides
    Yi-Fan Liou
    Tamara Vasylenko
    Chia-Lun Yeh
    Wei-Chun Lin
    Shih-Hsiang Chiu
    Phasit Charoenkwan
    Li-Sun Shu
    Shinn-Ying Ho
    Hui-Ling Huang
    [J]. BMC Genomics, 16
  • [4] iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides
    Charoenkwan, Phasit
    Yana, Janchai
    Nantasenamat, Chanin
    Hasan, Mehedi
    Shoombuatong, Watshara
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2020, 60 (12) : 6666 - 6678
  • [5] A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins
    Liu Y.-C.
    Yang M.-H.
    Lin W.-L.
    Huang C.-K.
    Oyang Y.-J.
    [J]. BMC Genomics, 10 (Suppl 3)
  • [6] iThermo: A Sequence-Based Model for Identifying Thermophilic Proteins Using a Multi-Feature Fusion Strategy
    Ahmed, Zahoor
    Zulfiqar, Hasan
    Khan, Abdullah Aman
    Gul, Ijaz
    Dao, Fu-Ying
    Zhang, Zhao-Yue
    Yu, Xiao-Long
    Tang, Lixia
    [J]. FRONTIERS IN MICROBIOLOGY, 2022, 13
  • [7] SCMRSA: a New Approach for Identifying and Analyzing Anti-MRSA Peptides Using Estimated Propensity Scores of Dipeptides
    Charoenkwan, Phasit
    Kanthawong, Sakawrat
    Schaduangrat, Nalini
    Li', Pietro
    Moni, Mohammad Ali
    Shoombuatong, Watshara
    [J]. ACS OMEGA, 2022, 7 (36): : 32653 - 32664
  • [8] ThermoFinder: A sequence-based thermophilic proteins prediction framework
    Yu, Han
    Luo, Xiaozhou
    [J]. INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2024, 270
  • [9] SCMTHP: A New Approach for Identifying and Characterizing of Tumor-Homing Peptides Using Estimated Propensity Scores of Amino Acids
    Charoenkwan, Phasit
    Chiangjong, Wararat
    Nantasenamat, Chanin
    Moni, Mohammad Ali
    Lio', Pietro
    Manavalan, Balachandran
    Shoombuatong, Watshara
    [J]. PHARMACEUTICS, 2022, 14 (01)
  • [10] GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features
    Malik, Adeel
    Shoombuatong, Watshara
    Kim, Chang-Bae
    Manavalan, Balachandran
    [J]. INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2023, 229 : 529 - 538