Prediction of human disease-associated phosphorylation sites with combined feature selection approach and support vector machine

被引:11
|
作者
Xu, Xiaoyi [1 ]
Li, Ao [1 ,2 ]
Wang, Minghui [1 ,2 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, AH-230027 Hefei, Peoples R China
[2] Univ Sci & Technol China, Ctr Biomed Engn, AH-230027 Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
proteins; cellular biophysics; diseases; support vector machines; feature selection; filtering theory; medical computing; bioinformatics; forward feature selection process; minimum-redundancy-maximum-relevance filtering process; cellular process; post-translational modification; support vector machine; human disease-associated phosphorylation sites; PROTEIN-PHOSPHORYLATION; PATTERN-RECOGNITION; IDENTIFICATION; SEQUENCE;
D O I
10.1049/iet-syb.2014.0051
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
Phosphorylation is a crucial post-translational modification, which regulates almost all cellular processes in life. It has long been recognised that protein phosphorylation has close relationship with diseases, and therefore many researches are undertaken to predict phosphorylation sites for disease treatment and drug design. However, despite the success achieved by these approaches, no method focuses on disease-associated phosphorylation sites prediction. Herein, for the first time the authors propose a novel approach that is specially designed to identify associations between phosphorylation sites and human diseases. To take full advantage of local sequence information, a combined feature selection method-based support vector machine (CFS-SVM) that incorporates minimum-redundancy-maximum-relevance filtering process and forward feature selection process is developed. Performance evaluation shows that CFS-SVM is significantly better than the widely used classifiers including Bayesian decision theory, k nearest neighbour and random forest. With the extremely high specificity of 99%, CFS-SVM can still achieve a high sensitivity. Besides, tests on extra data confirm the effectiveness and general applicability of CFS-SVM approach on a variety of diseases. Finally, the analysis of selected features and corresponding kinases also help the understanding of the potential mechanism of disease-phosphorylation relationships and guide further experimental validations.
引用
收藏
页码:155 / 163
页数:9
相关论文
共 50 条
  • [31] Optimal Feature Selection for Support Vector Machine Classifiers
    Strub, O.
    2020 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEE IEEM), 2020, : 304 - 308
  • [32] AutoMotif Server for prediction of phosphorylation sites in proteins using support vector machine: 2007 update
    Plewczynski, Dariusz
    Tkacz, Adrian
    Wyrwicz, Lucjan S.
    Rychlewski, Leszek
    Ginalski, Krzysztof
    JOURNAL OF MOLECULAR MODELING, 2008, 14 (01) : 69 - 76
  • [33] AutoMotif Server for prediction of phosphorylation sites in proteins using support vector machine: 2007 update
    Dariusz Plewczynski
    Adrian Tkacz
    Lucjan S. Wyrwicz
    Leszek Rychlewski
    Krzysztof Ginalski
    Journal of Molecular Modeling, 2008, 14 : 69 - 76
  • [34] PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine
    Dou, Yongchao
    Yao, Bo
    Zhang, Chi
    AMINO ACIDS, 2014, 46 (06) : 1459 - 1469
  • [35] PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine
    Yongchao Dou
    Bo Yao
    Chi Zhang
    Amino Acids, 2014, 46 : 1459 - 1469
  • [36] Prediction of mRNA polyadenylation sites by support vector machine
    Cheng, Yiming
    Miura, Robert M.
    Tian, Bin
    BIOINFORMATICS, 2006, 22 (19) : 2320 - 2325
  • [37] Prediction of lysine HMGylation sites using multiple feature extraction and fuzzy support vector machine
    Ju, Zhe
    Wang, Shi-Yun
    ANALYTICAL BIOCHEMISTRY, 2023, 663
  • [38] A SA-based feature selection and parameter optimization approach for support vector machine
    Lin, S.-W.
    Tseng, T.-Y.
    Chen, S.-C.
    Huang, J.-F.
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 3144 - 3146
  • [39] Parameter determination of support vector machine and feature selection using simulated annealing approach
    Lin, Shih-Wei
    Lee, Zne-Jung
    Chen, Shih-Chieh
    Tseng, Tsung-Yuan
    APPLIED SOFT COMPUTING, 2008, 8 (04) : 1505 - 1512
  • [40] Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods
    Huseyin Polat
    Homay Danaei Mehr
    Aydin Cetin
    Journal of Medical Systems, 2017, 41