Knot or not? Identifying unknotted proteins in knotted families with sequence-based Machine Learning model

被引:3
|
作者
Sikora, Maciej [1 ,2 ]
Klimentova, Eva [3 ,4 ]
Uchal, Dawid [1 ,5 ]
Sramkova, Denisa [3 ,4 ]
Perlinska, Agata P. [1 ]
Nguyen, Mai Lan [1 ]
Korpacz, Marta [1 ,2 ]
Malinowska, Roksana [1 ,2 ]
Nowakowski, Szymon [2 ,5 ]
Rubach, Pawel [1 ,6 ]
Simecek, Petr [3 ]
Sulkowska, Joanna I. [1 ]
机构
[1] Univ Warsaw, Ctr New Technol, Banacha 2c, PL-02097 Warsaw, Poland
[2] Univ Warsaw, Fac Math Informat & Mech, Warsaw, Poland
[3] Masaryk Univ, Cent European Inst Technol, Brno 62500, Czech Republic
[4] Masaryk Univ, Fac Sci, Natl Ctr Biomol Res, Brno, Czech Republic
[5] Univ Warsaw, Fac Phys, Warsaw, Poland
[6] Warsaw Sch Econ, Warsaw, Poland
关键词
AlphaFold; deep learning; knotted proteins; protein topology; SPOUT family proteins; FOLDING MECHANISM; TOPOLOGY; DYNAMICS;
D O I
10.1002/pro.4998
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Knotted proteins, although scarce, are crucial structural components of certain protein families, and their roles continue to be a topic of intense research. Capitalizing on the vast collection of protein structure predictions offered by AlphaFold (AF), this study computationally examines the entire UniProt database to create a robust dataset of knotted and unknotted proteins. Utilizing this dataset, we develop a machine learning (ML) model capable of accurately predicting the presence of knots in protein structures solely from their amino acid sequences. We tested the model's capabilities on 100 proteins whose structures had not yet been predicted by AF and found agreement with our local prediction in 92% cases. From the point of view of structural biology, we found that all potentially knotted proteins predicted by AF can be classified only into 17 families. This allows us to discover the presence of unknotted proteins in families with a highly conserved knot. We found only three new protein families: UCH, DUF4253, and DUF2254, that contain both knotted and unknotted proteins, and demonstrate that deletions within the knot core could potentially account for the observed unknotted (trivial) topology. Finally, we have shown that in the majority of knotted families (11 out of 15), the knotted topology is strictly conserved in functional proteins with very low sequence similarity. We have conclusively demonstrated that proteins AF predicts as unknotted are structurally accurate in their unknotted configurations. However, these proteins often represent nonfunctional fragments, lacking significant portions of the knot core (amino acid sequence).
引用
收藏
页数:21
相关论文
共 50 条
  • [21] IACP: a sequence-based tool for identifying anticancer peptides
    Chen, Wei
    Ding, Hui
    Feng, Pengmian
    Lin, Hao
    Chou, Kuo-Chen
    ONCOTARGET, 2016, 7 (13) : 16895 - 16909
  • [22] Sequence-Based Prediction of Metamorphic Behavior in Proteins
    Chen, Nanhao
    Das, Madhurima
    LiWang, Andy
    Wang, Lee-Ping
    BIOPHYSICAL JOURNAL, 2020, 119 (07) : 1380 - 1390
  • [23] Sequence-based feature prediction and annotation of proteins
    Juncker, Agnieszka S.
    Jensen, Lars J.
    Pierleoni, Andrea
    Bernsel, Andreas
    Tress, Michael L.
    Bork, Peer
    von Heijne, Gunnar
    Valencia, Alfonso
    Ouzounis, Christos A.
    Casadio, Rita
    Brunak, Soren
    GENOME BIOLOGY, 2009, 10 (02): : 206
  • [24] Ten quick tips for sequence-based prediction of protein properties using machine learning
    Hou, Qingzhen
    Waury, Katharina
    Gogishvili, Dea
    Feenstra, K. Anton
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (12)
  • [25] Identifying the antioxidant activity of tripeptides based on sequence information and machine learning
    Yang, Nanxiang
    Pei, Yongyan
    Wang, Yan
    Zhao, Limin
    Zhao, Ping
    Li, Zhanchao
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2023, 238
  • [26] AOPs-SVM: A Sequence-Based Classifier of Antioxidant Proteins Using a Support Vector Machine
    Meng, Chaolu
    Jin, Shunshan
    Wang, Lei
    Guo, Fei
    Zou, Quan
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2019, 7
  • [27] Improved sequence-based prediction of interaction sites in α-helical transmembrane proteins by deep learning
    Sun, Jianfeng
    Frishman, Dmitrij
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 1512 - 1530
  • [28] ATGPred-FL: sequence-based prediction of autophagy proteins with feature representation learning
    Jiao, Shihu
    Chen, Zheng
    Zhang, Lichao
    Zhou, Xun
    Shi, Lei
    AMINO ACIDS, 2022, 54 (05) : 799 - 809
  • [29] ATGPred-FL: sequence-based prediction of autophagy proteins with feature representation learning
    Shihu Jiao
    Zheng Chen
    Lichao Zhang
    Xun Zhou
    Lei Shi
    Amino Acids, 2022, 54 : 799 - 809
  • [30] Sequence-Based Prediction with Feature Representation Learning and Biological Function Analysis of Channel Proteins
    Chen, Zheng
    Jiao, Shihu
    Zhao, Da
    Hesham, Abd El-Latif
    Zou, Quan
    Xu, Lei
    Sun, Mingai
    Zhang, Lijun
    FRONTIERS IN BIOSCIENCE-LANDMARK, 2022, 27 (06):