ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique

被引:0
|
作者
Zuo, Yun [1 ]
Wan, Minquan [1 ]
Shen, Yang [1 ]
Wang, Xinheng [1 ]
He, Wenying [2 ]
Bi, Yue [3 ,4 ]
Liu, Xiangrong [5 ]
Deng, Zhaohong [1 ]
机构
[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214000, Peoples R China
[2] Hebei Univ Technol, Sch Artificial Intelligence, Tianjin 300130, Peoples R China
[3] Monash Univ, Dept Biochem & Mol Biol, Clayton, Australia
[4] Monash Univ, Biomed Discovery Inst, Clayton, Australia
[5] Xiamen Univ, Natl Inst Data Sci Hlth & Med, Dept Comp Sci & Technol, Xiamen Key Lab Intelligent Storage & Comp, Xiamen 361005, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein lysine crotonylation; Fully connected neural network; Imbalance data processing; Sequence analysis; PREDICTION; SEQUENCE; NETWORK;
D O I
10.1016/j.compbiolchem.2024.108212
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Protein lysine crotonylation is an important post-translational modification that regulates various cellular activities. For example, histone crotonylation affects chromatin structure and promotes histone replacement. Identification and understanding of lysine crotonylation sites is crucial in the field of protein research. However, due to the increasing amount of non-histone crotonylation sites, existing classifiers based on traditional machine learning may encounter performance limitations. In order to address this problem, a novel deep learning-based model for identifying crotonylation sites is presented in this study, given the unique advantages of deep learning techniques for sequence data analysis. In this study, an MLP-Attention-based model was developed for the identification of crotonylation sites. Firstly, three feature extraction strategies, namely Amino Acid Composition, K-mer, and Distance-based residue features extraction strategy, were used to encode crotonylated and noncrotonylated sequences. Then, in order to balance the training dataset, the FCM-GRNN undersampling algorithm combining fuzzy clustering and generalized neural network approaches was introduced. Finally, to improve the effectiveness of crotonylation site identification, we explored various classification algorithms, and based on the relevant experimental performance comparisons, the multilayer perceptron (MLP) combined with the superimposed self-attention mechanism was finally selected to construct the prediction model ILYCROsite. The results obtained from independent testing and five-fold cross-validation demonstrated that the model proposed in this study, ILYCROsite, had excellent performance. Notably, on the independent test set, ILYCROsite achieves an AUC value of 87.93 %, which is significantly better than the existing state-of-the-art models. In addition, SHAP (Shapley Additive exPlanations) values were used to analyze the importance of features and their impact on model predictions. Meanwhile, in order to facilitate researchers to use the prediction model constructed in this study, we developed a prediction program to identify the crotonylation sites in a given protein sequence. The data and code for this program are available at: https://github.com/wmqskr/ILYCROsite.
引用
收藏
页数:12
相关论文
共 21 条
  • [1] PreMLS: The undersampling technique based on ClusterCentroids to predict multiple lysine sites
    Zuo, Yun
    Fang, Xingze
    Wan, Jiayong
    He, Wenying
    Liu, Xiangrong
    Zeng, Xiangxiang
    Deng, Zhaohong
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (10)
  • [2] Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian
    Rulan Wang
    Zhuo Wang
    Hongfei Wang
    Yuxuan Pang
    Tzong-Yi Lee
    Scientific Reports, 10
  • [3] Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian
    Wang, Rulan
    Wang, Zhuo
    Wang, Hongfei
    Pang, Yuxuan
    Lee, Tzong-Yi
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [4] DeepCap-Kcr: accurate identification and investigation of protein lysine crotonylation sites based on capsule network
    Khanal, Jhabindra
    Tayara, Hilal
    Zou, Quan
    Chong, Kil To
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [5] Identification of Protein Lysine Crotonylation Sites by a Deep Learning Framework with Convolutional Neural Networks
    Zhao Y.
    He N.
    Chen Z.
    Li L.
    Chen, Zhen (zhenchen@qdu.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc., United States (08): : 14244 - 14252
  • [6] y Identification of Protein Lysine Crotonylation Sites by a Deep Learning Framework With Convolutional Neural Networks
    Zhao, Yiming
    He, Ningning
    Chen, Zhen
    Li, Lei
    IEEE ACCESS, 2020, 8 : 14244 - 14252
  • [7] Formator: Predicting Lysine Formylation Sites Based on the Most Distant Undersampling and Safe-Level Synthetic Minority Oversampling
    Jia, Cangzhi
    Zhang, Meng
    Fan, Cunshuo
    Li, Fuyi
    Song, Jiangning
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (05) : 1937 - 1945
  • [8] CapsNh-Kcr: Capsule network-based prediction of lysine crotonylation sites in human non-histone proteins
    Khanal, Jhabindra
    Kandel, Jeevan
    Tayara, Hilal
    Chong, Kil To
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2023, 21 : 120 - 127
  • [9] Characterization and Identification of Lysine Succinylation Sites based on Deep Learning Method
    Huang, Kai-Yao
    Hsu, Justin Bo-Kai
    Lee, Tzong-Yi
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [10] Characterization and Identification of Lysine Succinylation Sites based on Deep Learning Method
    Kai-Yao Huang
    Justin Bo-Kai Hsu
    Tzong-Yi Lee
    Scientific Reports, 9