Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features

被引:18
|
作者
Li, Bo [1 ]
Cai, Lijun [1 ]
Liao, Bo [1 ,2 ]
Fu, Xiangzheng [1 ]
Bing, Pingping [3 ]
Yang, Jialiang [2 ]
机构
[1] Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China
[2] Hainan Normal Univ, Sch Math & Stat, Haikou 570100, Hainan, Peoples R China
[3] Changsha Med Univ, Acad Working Stn, Changsha 410219, Hunan, Peoples R China
关键词
protein subcellular localization; protein primary sequence; generalized chaos game representation; statistical method; support vector machine; unitary distance; FEATURE-EXTRACTION; LOCATION;
D O I
10.3390/molecules24050919
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The prediction of protein subcellular localization is critical for inferring protein functions, gene regulations and protein-protein interactions. With the advances of high-throughput sequencing technologies and proteomic methods, the protein sequences of numerous yeasts have become publicly available, which enables us to computationally predict yeast protein subcellular localization. However, widely-used protein sequence representation techniques, such as amino acid composition and the Chou's pseudo amino acid composition (PseAAC), are difficult in extracting adequate information about the interactions between residues and position distribution of each residue. Therefore, it is still urgent to develop novel sequence representations. In this study, we have presented two novel protein sequence representation techniques including Generalized Chaos Game Representation (GCGR) based on the frequency and distributions of the residues in the protein primary sequence, and novel statistics and information theory (NSI) reflecting local position information of the sequence. In the GCGR + NSI representation, a protein primary sequence is simply represented by a 5-dimensional feature vector, while other popular methods like PseAAC and dipeptide adopt features of more than hundreds of dimensions. In practice, the feature representation is highly efficient in predicting protein subcellular localization. Even without using machine learning-based classifiers, a simple model based on the feature vector can achieve prediction accuracies of 0.8825 and 0.7736 respectively for the CL317 and ZW225 datasets. To further evaluate the effectiveness of the proposed encoding schemes, we introduce a multi-view features-based method to combine the two above-mentioned features with other well-known features including PseAAC and dipeptide composition, and use support vector machine as the classifier to predict protein subcellular localization. This novel model achieves prediction accuracies of 0.927 and 0.871 respectively for the CL317 and ZW225 datasets, better than other existing methods in the jackknife tests. The results suggest that the GCGR and NSI features are useful complements to popular protein sequence representations in predicting yeast protein subcellular localization. Finally, we validate a few newly predicted protein subcellular localizations by evidences from some published articles in authority journals and books.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Enhancing Membrane Protein Subcellular Localization Prediction by Parallel Fusion of Multi-View Features
    Yu, Dongjun
    Wu, Xiaowei
    Shen, Hongbin
    Yang, Jian
    Tang, Zhenmin
    Qi, Yong
    Yang, Jingyu
    [J]. IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2012, 11 (04) : 375 - 385
  • [2] Accurate prediction of multi-label protein subcellular localization through multi-view feature learning with RBRL classifier
    Zhang, Qi
    Zhang, Yandan
    Li, Shan
    Han, Yu
    Jin, Shuping
    Gu, Haiming
    Yu, Bin
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [3] CrystalM: A Multi-View Fusion Approach for Protein Crystallization Prediction
    Wang, Yubo
    Ding, Yijie
    Tang, Jijun
    Dai, Yu
    Guo, Fei
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (01) : 325 - 335
  • [4] SLPred: a multi-view subcellular localization prediction tool for multi-location human proteins
    Ozsari, Gokhan
    Rifaioglu, Ahmet Sureyya
    Atakan, Ahmet
    Tunca Dogan
    Martin, Maria Jesus
    Atalay, Rengul Cetin
    Atalay, Volkan
    [J]. BIOINFORMATICS, 2022, 38 (17) : 4226 - 4229
  • [5] Multi-view Based Gabor Features Fusion for Iris Recognition
    Jiang, Liang
    Zeng, Shan
    Kang, Zhen
    Zeng, Sen
    [J]. Journal of Computers (Taiwan), 2019, 30 (04) : 106 - 112
  • [6] UAV localization method with multi-view fusion
    Pang, Yang
    Wang, Ming
    Yan, Ziyi
    Yue, Tongyao
    Zhou, Zhe
    [J]. Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2023, 45 (04): : 1127 - 1133
  • [7] Robust Multi-view Features Fusion Method Based on CNMF
    Wang, Bangjun
    Yang, Liu
    Zhang, Li
    Li, Fanzhang
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 2018, 11304 : 27 - 39
  • [8] DLFF-ACP: prediction of ACPs based on deep learning and multi-view features fusion
    Cao, Ruifen
    Wang, Meng
    Bin, Yannan
    Zheng, Chunhou
    [J]. PEERJ, 2021, 9
  • [9] Multi-view features fusion for birdsong classification
    Xie, Shanshan
    Lu, Jing
    Liu, Jiang
    Zhang, Yan
    Lv, Danjv
    Chen, Xu
    Zhao, Youjie
    [J]. ECOLOGICAL INFORMATICS, 2022, 72
  • [10] Feature Fusion Based SVM Classifier for Protein Subcellular Localization Prediction
    Rahman, Julia
    Mondal, Md. Nazrul Islam
    Ben Islam, Md. Khaled
    Al Mehedi, Md.
    [J]. JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2016, 13 (01) : 288