Knot or not? Identifying unknotted proteins in knotted families with sequence-based Machine Learning model

被引:3
|
作者
Sikora, Maciej [1 ,2 ]
Klimentova, Eva [3 ,4 ]
Uchal, Dawid [1 ,5 ]
Sramkova, Denisa [3 ,4 ]
Perlinska, Agata P. [1 ]
Nguyen, Mai Lan [1 ]
Korpacz, Marta [1 ,2 ]
Malinowska, Roksana [1 ,2 ]
Nowakowski, Szymon [2 ,5 ]
Rubach, Pawel [1 ,6 ]
Simecek, Petr [3 ]
Sulkowska, Joanna I. [1 ]
机构
[1] Univ Warsaw, Ctr New Technol, Banacha 2c, PL-02097 Warsaw, Poland
[2] Univ Warsaw, Fac Math Informat & Mech, Warsaw, Poland
[3] Masaryk Univ, Cent European Inst Technol, Brno 62500, Czech Republic
[4] Masaryk Univ, Fac Sci, Natl Ctr Biomol Res, Brno, Czech Republic
[5] Univ Warsaw, Fac Phys, Warsaw, Poland
[6] Warsaw Sch Econ, Warsaw, Poland
关键词
AlphaFold; deep learning; knotted proteins; protein topology; SPOUT family proteins; FOLDING MECHANISM; TOPOLOGY; DYNAMICS;
D O I
10.1002/pro.4998
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Knotted proteins, although scarce, are crucial structural components of certain protein families, and their roles continue to be a topic of intense research. Capitalizing on the vast collection of protein structure predictions offered by AlphaFold (AF), this study computationally examines the entire UniProt database to create a robust dataset of knotted and unknotted proteins. Utilizing this dataset, we develop a machine learning (ML) model capable of accurately predicting the presence of knots in protein structures solely from their amino acid sequences. We tested the model's capabilities on 100 proteins whose structures had not yet been predicted by AF and found agreement with our local prediction in 92% cases. From the point of view of structural biology, we found that all potentially knotted proteins predicted by AF can be classified only into 17 families. This allows us to discover the presence of unknotted proteins in families with a highly conserved knot. We found only three new protein families: UCH, DUF4253, and DUF2254, that contain both knotted and unknotted proteins, and demonstrate that deletions within the knot core could potentially account for the observed unknotted (trivial) topology. Finally, we have shown that in the majority of knotted families (11 out of 15), the knotted topology is strictly conserved in functional proteins with very low sequence similarity. We have conclusively demonstrated that proteins AF predicts as unknotted are structurally accurate in their unknotted configurations. However, these proteins often represent nonfunctional fragments, lacking significant portions of the knot core (amino acid sequence).
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Knotted vs. Unknotted Proteins: Evidence of Knot-Promoting Loops
    Potestio, Raffaello
    Micheletti, Cristian
    Orland, Henri
    PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (07) : 3
  • [2] SeqSVM: A Sequence-Based Support Vector Machine Method for Identifying Antioxidant Proteins
    Xu, Lei
    Liang, Guangmin
    Shi, Shuhua
    Liao, Changrui
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2018, 19 (06):
  • [3] A sequence-based model for identifying proteins undergoing liquid-liquid phase separation/forming fibril aggregates via machine learning
    Liao, Shaofeng
    Zhang, Yujun
    Han, Xinchen
    Wang, Tinglan
    Wang, Xi
    Yan, Qinglin
    Li, Qian
    Qi, Yifei
    Zhang, Zhuqing
    PROTEIN SCIENCE, 2024, 33 (03)
  • [4] A sequence-based multiple kernel model for identifying DNA-binding proteins
    Yuqing Qian
    Limin Jiang
    Yijie Ding
    Jijun Tang
    Fei Guo
    BMC Bioinformatics, 22
  • [5] A sequence-based multiple kernel model for identifying DNA-binding proteins
    Qian, Yuqing
    Jiang, Limin
    Ding, Yijie
    Tang, Jijun
    Guo, Fei
    BMC BIOINFORMATICS, 2021, 22 (SUPPL 3)
  • [6] Sequence-Based Prediction of Plant Allergenic Proteins: Machine Learning Classification Approach
    Nedyalkova, Miroslava
    Vasighi, Mahdi
    Azmoon, Amirreza
    Naneva, Ludmila
    Simeonov, Vasil
    ACS OMEGA, 2023, : 3698 - 3704
  • [7] A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins
    Liu Y.-C.
    Yang M.-H.
    Lin W.-L.
    Huang C.-K.
    Oyang Y.-J.
    BMC Genomics, 10 (Suppl 3)
  • [8] VacPred: Sequence-based prediction of plant vacuole proteins using machine-learning techniques
    Yadav, Arvind Kumar
    Singla, Deepak
    JOURNAL OF BIOSCIENCES, 2020, 45 (01)
  • [9] VacPred: Sequence-based prediction of plant vacuole proteins using machine-learning techniques
    Arvind Kumar Yadav
    Deepak Singla
    Journal of Biosciences, 2020, 45
  • [10] A Novel Hybrid Sequence-Based Model for Identifying Anticancer Peptides
    Xu, Lei
    Liang, Guangmin
    Wang, Longjie
    Liao, Changrui
    GENES, 2018, 9 (03)