ThermoFinder: A sequence-based thermophilic proteins prediction framework

被引:0
|
作者
Yu, Han [1 ,2 ,3 ,4 ]
Luo, Xiaozhou [1 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen Key Lab Intelligent Microbial Mfg Med, Shenzhen 518055, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, Shenzhen Inst Synthet Biol, Shenzhen Inst Adv Technol, CAS Key Lab Quantitat Engn Biol, Shenzhen 518055, Peoples R China
[4] Chinese Acad Sci, Shenzhen Inst Synthet Biol, Shenzhen Inst Adv Technol, Ctr Synthet Biochem, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Thermophilic proteins prediction; Sequence analysis; Machine learning; INFORMATION; LANGUAGE;
D O I
10.1016/j.ijbiomac.2024.132469
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Thermophilic proteins are important for academic research and industrial processes, and various computational methods have been developed to identify and screen them. However, their performance has been limited due to the lack of high-quality labeled data and efficient models for representing protein. Here, we proposed a novel sequence-based thermophilic proteins prediction framework, called ThermoFinder. The results demonstrated that ThermoFinder outperforms previous state-of-the-art tools on two benchmark datasets, and feature ablation experiments confirmed the effectiveness of our approach. Additionally, ThermoFinder exhibited exceptional performance and consistency across two newly constructed datasets, one of these was specifically constructed for the regression-based prediction of temperature optimum values directly derived from protein sequences. The feature importance analysis, using shapley additive explanations, further validated the advantages of ThermoFinder. We believe that ThermoFinder will be a valuable and comprehensive framework for predicting thermophilic proteins, and we have made our model open source and available on Github at https://github. com/Luo-SynBioLab/ThermoFinder.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Sequence-based feature prediction and annotation of proteins
    Agnieszka S Juncker
    Lars J Jensen
    Andrea Pierleoni
    Andreas Bernsel
    Michael L Tress
    Peer Bork
    Gunnar von Heijne
    Alfonso Valencia
    Christos A Ouzounis
    Rita Casadio
    Søren Brunak
    [J]. Genome Biology, 10
  • [2] Sequence-Based Prediction of Metamorphic Behavior in Proteins
    Chen, Nanhao
    Das, Madhurima
    LiWang, Andy
    Wang, Lee-Ping
    [J]. BIOPHYSICAL JOURNAL, 2020, 119 (07) : 1380 - 1390
  • [3] Sequence-based feature prediction and annotation of proteins
    Juncker, Agnieszka S.
    Jensen, Lars J.
    Pierleoni, Andrea
    Bernsel, Andreas
    Tress, Michael L.
    Bork, Peer
    von Heijne, Gunnar
    Valencia, Alfonso
    Ouzounis, Christos A.
    Casadio, Rita
    Brunak, Soren
    [J]. GENOME BIOLOGY, 2009, 10 (02): : 206
  • [4] Sequence-Based Prediction of Type III Secreted Proteins
    Arnold, Roland
    Brandmaier, Stefan
    Kleine, Frederick
    Tischler, Patrick
    Heinz, Eva
    Behrens, Sebastian
    Niinikoski, Antti
    Mewes, Hans-Werner
    Horn, Matthias
    Rattei, Thomas
    [J]. PLOS PATHOGENS, 2009, 5 (04)
  • [5] Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins
    Raimondi, Daniele
    Orlando, Gabriele
    Pancsa, Rita
    Khan, Taushif
    Vranken, Wim F.
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [6] Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins
    Daniele Raimondi
    Gabriele Orlando
    Rita Pancsa
    Taushif Khan
    Wim F. Vranken
    [J]. Scientific Reports, 7
  • [7] DeepCrystal: a deep learning framework for sequence-based protein crystallization prediction
    Elbasir, Abdurrahman
    Moovarkumudalvan, Balasubramanian
    Kunji, Khalid
    Kolatkar, Prasanna R.
    Mall, Raghvendra
    Bensmail, Halima
    [J]. BIOINFORMATICS, 2019, 35 (13) : 2216 - 2225
  • [8] DeepSol: a deep learning framework for sequence-based protein solubility prediction
    Khurana, Sameer
    Rawi, Reda
    Kunji, Khalid
    Chuang, Gwo-Yu
    Bensmail, Halima
    Mall, Raghvendra
    [J]. BIOINFORMATICS, 2018, 34 (15) : 2605 - 2613
  • [9] DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction
    Elbasir, Abdurrahman
    Moovarkumudalvan, Balasubramanian
    Kunji, Khalid
    Kolatkar, Prasanna R.
    Bensmail, Halima
    Mall, Raghvendra
    [J]. PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 2747 - 2749
  • [10] Identification of Thermophilic Proteins Based on Sequence-Based Bidirectional Representations from Transformer-Embedding Features
    Pei, Hongdi
    Li, Jiayu
    Ma, Shuhan
    Jiang, Jici
    Li, Mingxin
    Zou, Quan
    Lv, Zhibin
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (05):