Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition

被引:121
|
作者
Shi, J.-Y. [1 ]
Zhang, S.-W. [1 ]
Pan, Q. [1 ]
Cheng, Y.-M. [1 ]
Xie, J. [1 ]
机构
[1] Northwestern Polytech Univ, Coll Automat, Xian 710072, Peoples R China
关键词
multi-scale energy; Wavelet transform; support vector machines; Chou's pseudo amino acid composition; protein subcellular localizations;
D O I
10.1007/s00726-006-0475-y
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
As more and more genomes have been discovered in recent years, there is an urgent need to develop a reliable method to predict the subcellular localization for the explosion of newly found proteins. However, many well-known prediction methods based on amino acid composition have problems utilizing the sequence-order information. Here, based on the concept of Chou's pseudo amino acid composition (PseAA), a new feature extraction method, the multi-scale energy ( MSE) approach, is introduced to incorporate the sequence-order information. First, a protein sequence was mapped to a digital signal using the amino acid index. Then, by wavelet transform, the mapped signal was broken down into several scales in which the energy factors were calculated and further formed into an MSE feature vector. Following this, combining this MSE feature vector with amino acid composition ( AA), we constructed a series of MSEPseAA feature vectors to represent the protein subcellular localization sequences. Finally, according to a new kind of normalization approach, the MSEPseAA feature vectors were normalized to form the improved MSEPseAA vectors, named as IEPseAA. Using the technique of IEPseAA, C-support vector machine (C-SVM) and three multi-class SVMs strategies, quite promising results were obtained, indicating that MSE is quite effective in reflecting the sequence-order effects and might become a useful tool for predicting the other attributes of proteins as well.
引用
收藏
页码:69 / 74
页数:6
相关论文
共 50 条
  • [41] Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition
    Tanwir Habib
    Chaoyang Zhang
    Jack Y Yang
    Mary Qu Yang
    Youping Deng
    BMC Genomics, 9
  • [42] Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition
    Habib, Tanwir
    Zhang, Chaoyang
    Yang, Jack Y.
    Yang, Mary Qu
    Deng, Youping
    BMC GENOMICS, 2008, 9 (Suppl 1)
  • [43] Prediction and classification of protein subcellular location - Sequence-order effect and pseudo amino acid composition
    Chou, KC
    Cai, YD
    JOURNAL OF CELLULAR BIOCHEMISTRY, 2003, 90 (06) : 1250 - 1260
  • [44] Protein subcellular location prediction based on pseudo amino acid composition and PSI-blast profile
    Xu, Huimin
    Yan, Shoujiang
    Dai, Qi
    He, Ping-An
    Liao, Bo
    Yao, Yu-Hua
    Journal of Computational and Theoretical Nanoscience, 2015, 12 (10) : 3756 - 3762
  • [45] Predicting subcellular localization of proteins using support vector machine with N-terminal amino composition
    Li, YF
    Liu, J
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 618 - 625
  • [46] Prediction of protein cellular attributes using pseudo-amino acid composition
    Chou, KC
    PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03): : 246 - 255
  • [47] Protein subcellular localization prediction using multiple kernel learning based support vector machine
    Hasan, Md. Al Mehedi
    Ahmad, Shamim
    Molla, Md. Khademul Islam
    MOLECULAR BIOSYSTEMS, 2017, 13 (04) : 785 - 795
  • [48] Using cellular automata images and pseudo amino acid composition to predict protein subcellular location
    X. Xiao
    S. Shao
    Y. Ding
    Z. Huang
    K.-C. Chou
    Amino Acids, 2006, 30 (1) : 49 - 54
  • [49] Using cellular automata images and pseudo amino acid composition to predict protein subcellular location
    Xiao, X
    Shao, S
    Ding, Y
    Huang, Z
    Chou, KC
    AMINO ACIDS, 2006, 30 (01) : 49 - 54
  • [50] Protein cellular localization prediction with support vector machines and decision trees
    Lorena, Ana Carolina
    de Carvalho, Andre C. P. L. F.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2007, 37 (02) : 115 - 125