THPLM: a sequence-based deep learning framework for protein stability changes prediction upon point variations using pretrained protein language model

被引:3
|
作者
Gong, Jianting [1 ,2 ]
Jiang, Lili [1 ,2 ]
Chen, Yongbing [1 ,2 ]
Zhang, Yixiang [1 ,2 ]
Li, Xue [2 ]
Ma, Zhiqiang [1 ,3 ]
Fu, Zhiguo [1 ]
He, Fei [1 ]
Sun, Pingping [1 ]
Ren, Zilin [1 ,2 ]
Tian, Mingyao [1 ,2 ]
机构
[1] Northeast Normal Univ, Inst Computat Biol, Sch Informat Sci & Technol, Changchun 130117, Peoples R China
[2] Chinese Acad Agr Sci, Changchun Vet Res Inst, Changchun 130122, Peoples R China
[3] Northeast Normal Univ, Coll Humanities & Sci, Dept Comp Sci, Changchun 130117, Peoples R China
基金
中国国家自然科学基金;
关键词
MUTATIONS; SERVER; GENE;
D O I
10.1093/bioinformatics/btad646
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Quantitative determination of protein thermodynamic stability is a critical step in protein and drug design. Reliable prediction of protein stability changes caused by point variations contributes to developing-related fields. Over the past decades, dozens of structure-based and sequence-based methods have been proposed, showing good prediction performance. Despite the impressive progress, it is necessary to explore wild-type and variant protein representations to address the problem of how to represent the protein stability change in view of global sequence. With the development of structure prediction using learning-based methods, protein language models (PLMs) have shown accurate and high-quality predictions of protein structure. Because PLM captures the atomic-level structural information, it can help to understand how single-point variations cause functional changes. Results: Here, we proposed THPLM, a sequence-based deep learning model for stability change prediction using Meta's ESM-2. With ESM-2 and a simple convolutional neural network, THPLM achieved comparable or even better performance than most methods, including sequencebased and structure-based methods. Furthermore, the experimental results indicate that the PLM's ability to generate representations of sequence can effectively improve the ability of protein function prediction. Availability and implementation: The source code of THPLM and the testing data can be accessible through the following links: https://github. com/FPPGroup/THPLM.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] A Deep-Learning Sequence-Based Method to Predict Protein Stability Changes Upon Genetic Variations
    Pancotti, Corrado
    Benevenuta, Silvia
    Repetto, Valeria
    Birolo, Giovanni
    Capriotti, Emidio
    Sanavia, Tiziana
    Fariselli, Piero
    [J]. GENES, 2021, 12 (06)
  • [2] DeepCrystal: a deep learning framework for sequence-based protein crystallization prediction
    Elbasir, Abdurrahman
    Moovarkumudalvan, Balasubramanian
    Kunji, Khalid
    Kolatkar, Prasanna R.
    Mall, Raghvendra
    Bensmail, Halima
    [J]. BIOINFORMATICS, 2019, 35 (13) : 2216 - 2225
  • [3] DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction
    Elbasir, Abdurrahman
    Moovarkumudalvan, Balasubramanian
    Kunji, Khalid
    Kolatkar, Prasanna R.
    Bensmail, Halima
    Mall, Raghvendra
    [J]. PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 2747 - 2749
  • [4] DeepSol: a deep learning framework for sequence-based protein solubility prediction
    Khurana, Sameer
    Rawi, Reda
    Kunji, Khalid
    Chuang, Gwo-Yu
    Bensmail, Halima
    Mall, Raghvendra
    [J]. BIOINFORMATICS, 2018, 34 (15) : 2605 - 2613
  • [5] Sequence-based prediction of protein protein interaction using a deep-learning algorithm
    Sun, Tanlin
    Zhou, Bo
    Lai, Luhua
    Pei, Jianfeng
    [J]. BMC BIOINFORMATICS, 2017, 18
  • [6] Sequence-based prediction of protein protein interaction using a deep-learning algorithm
    Tanlin Sun
    Bo Zhou
    Luhua Lai
    Jianfeng Pei
    [J]. BMC Bioinformatics, 18
  • [7] Biological Features for Sequence-Based Prediction of Protein Stability Changes upon Amino Acid Substitutions
    Teng, Shaolei
    Srivastava, Anand K.
    Wang, Liangjiang
    [J]. 2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS, 2009, : 201 - 206
  • [8] LMPhosSite: A Deep Learning-Based Approach for General Protein Phosphorylation Site Prediction Using Embeddings from the Local Window Sequence and Pretrained Protein Language Model
    Pakhrin, Subash C.
    Pokharel, Suresh
    Pratyush, Pawel
    Chaudhari, Meenal
    Ismail, Hamid D.
    Dukka, B. K. C. B.
    [J]. JOURNAL OF PROTEOME RESEARCH, 2023, 22 (08) : 2548 - 2557
  • [9] Sequence representation approaches for sequence-based protein prediction tasks that use deep learning
    Cui, Feifei
    Zhang, Zilong
    Zou, Quan
    [J]. BRIEFINGS IN FUNCTIONAL GENOMICS, 2021, 20 (01) : 61 - 73
  • [10] Single-sequence protein structure prediction using a language model and deep learning
    Ratul Chowdhury
    Nazim Bouatta
    Surojit Biswas
    Christina Floristean
    Anant Kharkar
    Koushik Roy
    Charlotte Rochereau
    Gustaf Ahdritz
    Joanna Zhang
    George M. Church
    Peter K. Sorger
    Mohammed AlQuraishi
    [J]. Nature Biotechnology, 2022, 40 : 1617 - 1623