Protein sumoylation sites prediction based on two-stage feature selection

被引:27
|
作者
Lu, Lin [3 ]
Shi, Xiao-He [5 ]
Li, Su-Jun [7 ]
Xie, Zhi-Qun [1 ]
Feng, Yong-Li [6 ]
Lu, Wen-Cong [6 ]
Li, Yi-Xue [4 ,7 ]
Li, Haipeng [1 ]
Cai, Yu-Dong [1 ,2 ]
机构
[1] Chinese Acad Sci, Shanghai Inst Biol Sci, MPG Partner Inst Computat Biol, Shanghai 200031, Peoples R China
[2] Shanghai Univ, Inst Syst Biol, Shanghai 200244, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Biomed Engn, Shanghai 200240, Peoples R China
[4] Sch Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China
[5] Chinese Acad Sci, Shanghai Inst Biol Sci, Inst Hlth Sci, Shanghai 200025, Peoples R China
[6] Coll Sci, Dept Chem, Shanghai 200444, Peoples R China
[7] Chinese Acad Sci, Shanghai Inst Biol Sci, Key Lab Syst Biol, Shanghai 200031, Peoples R China
关键词
Prediction; Protein sumoylation; mRMR; AAIndex; Nearest Neighbor Algorithm; Leave-one-out cross-validation; Bioinformatics; ACID INDEX DATABASE; SUMO; CONJUGATION; AAINDEX; UBC9;
D O I
10.1007/s11030-009-9149-5
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein sumoylation is one of the most important post-translational modifications. Accurate prediction of sumoylation sites is very useful for the analysis of proteome. Though the putative motif IK XE can be used, optimization of prediction models still remains a challenge. In this study, we developed a prediction system based on feature selection strategy. A total of 1,272 peptides with 14 residues from SUMOsp (Xue et al. [8] Nucleic Acids Res 34:W254-W257, 2006) were investigated in this study, including 212 substrates and 1,060 non-substrates. Among the substrates, only 162 substrates comply to the motif IK XE. First, 1,272 substrates were divided into training set and test set. All the substrates were encoded into feature vectors by hundreds of amino acid properties collected by Amino Acid Index Database (AAIndex, http://www.genome.jp/aaindex ). Then, mRMR (minimum redundancy-maximum relevance) method was applied to extract the most informative features. Finally, Nearest Neighbor Algorithm (NNA) was used to produce the prediction models. Tested by Leave-one-out (LOO) cross-validation, the optimal prediction model reaches the accuracy of 84.4% for the training set and 76.4% for the test set. Especially, 180 substrates were correctly predicted, which was 18 more than using the motif IK XE. The final selected features indicate that amino acid residues with two-residue downstream and one-residue upstream of the sumoylation sites play the most important role in determining the occurrence of sumoylation. Based on the feature selection strategy, our prediction system can not only be used for high throughput prediction of sumoylation sites but also as a tool to investigate the mechanism of sumoylation.
引用
收藏
页码:81 / 86
页数:6
相关论文
共 50 条
  • [1] Protein sumoylation sites prediction based on two-stage feature selection
    Lin Lu
    Xiao-He Shi
    Su-Jun Li
    Zhi-Qun Xie
    Yong-Li Feng
    Wen-Cong Lu
    Yi-Xue Li
    Haipeng Li
    Yu-Dong Cai
    Molecular Diversity, 2010, 14 : 81 - 86
  • [2] A Two-Stage Feature Selection Algorithm Based on Redundancy and Relevance
    Antioquia, Arren Matthew C.
    Azcarraga, Arnulfo P.
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [3] Two-Stage Prediction of Comorbid Cancer Patient Survivability Based on Improved Infinite Feature Selection
    Liu, Peng
    Fei, Shumin
    IEEE ACCESS, 2020, 8 : 169559 - 169567
  • [4] Two-Stage Feature Selection with Unsupervised Second Stage
    Xu, Ke
    Arai, Hiromasa
    Maung, Crystal
    Schweitzer, Haim
    2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2017), 2017, : 153 - 159
  • [5] Two-Stage Feature Selection with Unsupervised Second Stage
    Xu, Ke
    Maung, Crystal
    Arai, Hiromasa
    Schweitzer, Haim
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2018, 27 (07)
  • [6] Two-Stage Feature Selection for Text Classification
    Ozgur, Levent
    Gungor, Tunga
    INFORMATION SCIENCES AND SYSTEMS 2015, 2016, 363 : 329 - 337
  • [7] HIV-1 Protease Cleavage Site Prediction Based on Two-Stage Feature Selection Method
    Niu, Bing
    Yuan, Xiao-Cheng
    Roeper, Preston
    Su, Qiang
    Peng, Chun-Rong
    Yin, Jing-Yuan
    Ding, Juan
    Li, HaiPeng
    Lu, Wen-Cong
    PROTEIN AND PEPTIDE LETTERS, 2013, 20 (03): : 290 - 298
  • [8] A two-stage causality method for time series prediction based on feature selection and momentary conditional independence
    Ma, Dewei
    Ren, Weijie
    Han, Min
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2022, 595
  • [9] A hybrid two-stage feature selection method based on differential evolution
    Qiu, Chenye
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (01) : 871 - 884
  • [10] Two-Stage Feature Selection of Voice Parameters for Early Alzheimer's Disease Prediction
    Mirzaei, S.
    El Yacoubi, M.
    Garcia-Salicetti, S.
    Boudy, J.
    Kahindo, C.
    Cristancho-Lacroix, V.
    Kerherve, H.
    Rigaud, A. -S.
    IRBM, 2018, 39 (06) : 430 - 435