Knot or not? Identifying unknotted proteins in knotted families with sequence-based Machine Learning model

被引:3
|
作者
Sikora, Maciej [1 ,2 ]
Klimentova, Eva [3 ,4 ]
Uchal, Dawid [1 ,5 ]
Sramkova, Denisa [3 ,4 ]
Perlinska, Agata P. [1 ]
Nguyen, Mai Lan [1 ]
Korpacz, Marta [1 ,2 ]
Malinowska, Roksana [1 ,2 ]
Nowakowski, Szymon [2 ,5 ]
Rubach, Pawel [1 ,6 ]
Simecek, Petr [3 ]
Sulkowska, Joanna I. [1 ]
机构
[1] Univ Warsaw, Ctr New Technol, Banacha 2c, PL-02097 Warsaw, Poland
[2] Univ Warsaw, Fac Math Informat & Mech, Warsaw, Poland
[3] Masaryk Univ, Cent European Inst Technol, Brno 62500, Czech Republic
[4] Masaryk Univ, Fac Sci, Natl Ctr Biomol Res, Brno, Czech Republic
[5] Univ Warsaw, Fac Phys, Warsaw, Poland
[6] Warsaw Sch Econ, Warsaw, Poland
关键词
AlphaFold; deep learning; knotted proteins; protein topology; SPOUT family proteins; FOLDING MECHANISM; TOPOLOGY; DYNAMICS;
D O I
10.1002/pro.4998
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Knotted proteins, although scarce, are crucial structural components of certain protein families, and their roles continue to be a topic of intense research. Capitalizing on the vast collection of protein structure predictions offered by AlphaFold (AF), this study computationally examines the entire UniProt database to create a robust dataset of knotted and unknotted proteins. Utilizing this dataset, we develop a machine learning (ML) model capable of accurately predicting the presence of knots in protein structures solely from their amino acid sequences. We tested the model's capabilities on 100 proteins whose structures had not yet been predicted by AF and found agreement with our local prediction in 92% cases. From the point of view of structural biology, we found that all potentially knotted proteins predicted by AF can be classified only into 17 families. This allows us to discover the presence of unknotted proteins in families with a highly conserved knot. We found only three new protein families: UCH, DUF4253, and DUF2254, that contain both knotted and unknotted proteins, and demonstrate that deletions within the knot core could potentially account for the observed unknotted (trivial) topology. Finally, we have shown that in the majority of knotted families (11 out of 15), the knotted topology is strictly conserved in functional proteins with very low sequence similarity. We have conclusively demonstrated that proteins AF predicts as unknotted are structurally accurate in their unknotted configurations. However, these proteins often represent nonfunctional fragments, lacking significant portions of the knot core (amino acid sequence).
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Machine learning techniques for sequence-based prediction of viral-host interactions between SARS-CoV-2 and human proteins
    Dey, Lopamudra
    Chakraborty, Sanjay
    Mukhopadhyay, Anirban
    BIOMEDICAL JOURNAL, 2020, 43 (05) : 438 - 450
  • [42] Sequence-based statistical downscaling and its application to hydrologic simulations based on machine learning and big data
    Wang, Qingrui
    Huang, Jing
    Liu, Ruimin
    Men, Cong
    Guo, Lijia
    Miao, Yuexi
    Jiao, Lijun
    Wang, Yifan
    Shoaib, Muhammad
    Xia, Xinghui
    JOURNAL OF HYDROLOGY, 2020, 586
  • [43] A sequence-based, deep learning model accurately predicts RNA splicing branchpoints
    Paggi, Joseph M.
    Bejerano, Gill
    RNA, 2018, 24 (12) : 1647 - 1658
  • [44] A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features
    Li, Liqi
    Luo, Qifa
    Xiao, Weidong
    Li, Jinhui
    Zhou, Shiwen
    Li, Yongsheng
    Zheng, Xiaoqi
    Yang, Hua
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2017, 15 (01)
  • [45] PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine
    Manavalan, Balachandran
    Shin, Tae H.
    Lee, Gwang
    FRONTIERS IN MICROBIOLOGY, 2018, 9
  • [46] Sequence-based machine learning method for predicting the effects of phosphorylation on protein-protein interactions
    Hong, Xiaokun
    Lv, Jiyang
    Li, Zhengxin
    Xiong, Yi
    Zhang, Jian
    Chen, Hai-Feng
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2023, 243
  • [47] ATLAS: A Sequence-based Learning Approach for Attack Investigation
    Alsaheel, Abdulellah
    Nan, Yuhong
    Ma, Shiqing
    Yu, Le
    Walkup, Gregory
    Celik, Z. Berkay
    Zhang, Xiangyu
    Xu, Dongyan
    PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, 2021, : 3005 - 3022
  • [48] IBPred: A sequence-based predictor for identifying ion binding protein in phage
    Yuan, Shi-Shi
    Gao, Dong
    Xie, Xue-Qin
    Ma, Cai-Yi
    Su, Wei
    Zhang, Zhao-Yue
    Zheng, Yan
    Ding, Hui
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 4942 - 4951
  • [49] Sequence-based imitation learning for surgical robot operations
    Furnari, Gabriele
    Secchi, Cristian
    Ferraguti, Federica
    ARTIFICIAL INTELLIGENCE SURGERY, 2025, 5 (01): : 103 - 115
  • [50] A Sequence-Based Machine Comprehension Modeling Using LSTM and GRU
    Viswanathan, Sujith
    Kumar, M. Anand
    Soman, K. P.
    EMERGING RESEARCH IN ELECTRONICS, COMPUTER SCIENCE AND TECHNOLOGY, ICERECT 2018, 2019, 545 : 46 - 54