Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition

被引:121
|
作者
Shi, J.-Y. [1 ]
Zhang, S.-W. [1 ]
Pan, Q. [1 ]
Cheng, Y.-M. [1 ]
Xie, J. [1 ]
机构
[1] Northwestern Polytech Univ, Coll Automat, Xian 710072, Peoples R China
关键词
multi-scale energy; Wavelet transform; support vector machines; Chou's pseudo amino acid composition; protein subcellular localizations;
D O I
10.1007/s00726-006-0475-y
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
As more and more genomes have been discovered in recent years, there is an urgent need to develop a reliable method to predict the subcellular localization for the explosion of newly found proteins. However, many well-known prediction methods based on amino acid composition have problems utilizing the sequence-order information. Here, based on the concept of Chou's pseudo amino acid composition (PseAA), a new feature extraction method, the multi-scale energy ( MSE) approach, is introduced to incorporate the sequence-order information. First, a protein sequence was mapped to a digital signal using the amino acid index. Then, by wavelet transform, the mapped signal was broken down into several scales in which the energy factors were calculated and further formed into an MSE feature vector. Following this, combining this MSE feature vector with amino acid composition ( AA), we constructed a series of MSEPseAA feature vectors to represent the protein subcellular localization sequences. Finally, according to a new kind of normalization approach, the MSEPseAA feature vectors were normalized to form the improved MSEPseAA vectors, named as IEPseAA. Using the technique of IEPseAA, C-support vector machine (C-SVM) and three multi-class SVMs strategies, quite promising results were obtained, indicating that MSE is quite effective in reflecting the sequence-order effects and might become a useful tool for predicting the other attributes of proteins as well.
引用
收藏
页码:69 / 74
页数:6
相关论文
共 50 条
  • [21] A novel method for predicting protein subcellular localization based on pseudo amino acid composition
    Ma, Junwei
    Gu, Hong
    BMB REPORTS, 2010, 43 (10) : 670 - 676
  • [22] Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network
    Ding, Yong-Sheng
    Zhang, Tong-Liang
    Chou, Kuo-Chen
    PROTEIN AND PEPTIDE LETTERS, 2007, 14 (08): : 811 - 815
  • [23] Using pseudo amino acid composition to predict protein subcellular location: approached with amino acid composition distribution
    Shi, J. -Y.
    Zhang, S. -W.
    Pan, Q.
    Zhou, G. -P.
    AMINO ACIDS, 2008, 35 (02) : 321 - 327
  • [24] Using pseudo amino acid composition to predict protein subcellular location: approached with amino acid composition distribution
    J.-Y. Shi
    S.-W. Zhang
    Q. Pan
    G.-P. Zhou
    Amino Acids, 2008, 35 : 321 - 327
  • [25] Prediction of apoptosis protein subcellular location based on amphiphilic pseudo amino acid composition
    Su, Wenxia
    Deng, Shuyi
    Gu, Zhifeng
    Yang, Keli
    Ding, Hui
    Chen, Hui
    Zhang, Zhaoyue
    FRONTIERS IN GENETICS, 2023, 14
  • [26] Using a Novel AdaBoost Algorithm and Chou's Pseudo Amino Acid Composition for Predicting Protein Subcellular Localization
    Lin, Jie
    Wang, Yan
    PROTEIN AND PEPTIDE LETTERS, 2011, 18 (12): : 1219 - 1225
  • [27] Prediction of Protein Secondary Structure Content by Using the Concept of Chou's Pseudo Amino Acid Composition and Support Vector Machine
    Chen, Chao
    Chen, Lixuan
    Zou, Xiaoyong
    Cai, Peixiang
    PROTEIN AND PEPTIDE LETTERS, 2009, 16 (01): : 27 - 31
  • [28] Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition
    Wang, M
    Yang, J
    Liu, GP
    Xu, ZJ
    Chou, KC
    PROTEIN ENGINEERING DESIGN & SELECTION, 2004, 17 (06): : 509 - 516
  • [29] Support vector machine approach for protein subcellular localization prediction
    Hua, SJ
    Sun, ZR
    BIOINFORMATICS, 2001, 17 (08) : 721 - 728
  • [30] A novel representation of protein sequences for prediction of subcellular location using support vector machines
    Matsuda, S
    Vert, JP
    Saigo, H
    Ueda, N
    Toh, H
    Akutsu, T
    PROTEIN SCIENCE, 2005, 14 (11) : 2804 - 2813