Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features

被引:18
|
作者
Li, Bo [1 ]
Cai, Lijun [1 ]
Liao, Bo [1 ,2 ]
Fu, Xiangzheng [1 ]
Bing, Pingping [3 ]
Yang, Jialiang [2 ]
机构
[1] Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China
[2] Hainan Normal Univ, Sch Math & Stat, Haikou 570100, Hainan, Peoples R China
[3] Changsha Med Univ, Acad Working Stn, Changsha 410219, Hunan, Peoples R China
关键词
protein subcellular localization; protein primary sequence; generalized chaos game representation; statistical method; support vector machine; unitary distance; FEATURE-EXTRACTION; LOCATION;
D O I
10.3390/molecules24050919
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The prediction of protein subcellular localization is critical for inferring protein functions, gene regulations and protein-protein interactions. With the advances of high-throughput sequencing technologies and proteomic methods, the protein sequences of numerous yeasts have become publicly available, which enables us to computationally predict yeast protein subcellular localization. However, widely-used protein sequence representation techniques, such as amino acid composition and the Chou's pseudo amino acid composition (PseAAC), are difficult in extracting adequate information about the interactions between residues and position distribution of each residue. Therefore, it is still urgent to develop novel sequence representations. In this study, we have presented two novel protein sequence representation techniques including Generalized Chaos Game Representation (GCGR) based on the frequency and distributions of the residues in the protein primary sequence, and novel statistics and information theory (NSI) reflecting local position information of the sequence. In the GCGR + NSI representation, a protein primary sequence is simply represented by a 5-dimensional feature vector, while other popular methods like PseAAC and dipeptide adopt features of more than hundreds of dimensions. In practice, the feature representation is highly efficient in predicting protein subcellular localization. Even without using machine learning-based classifiers, a simple model based on the feature vector can achieve prediction accuracies of 0.8825 and 0.7736 respectively for the CL317 and ZW225 datasets. To further evaluate the effectiveness of the proposed encoding schemes, we introduce a multi-view features-based method to combine the two above-mentioned features with other well-known features including PseAAC and dipeptide composition, and use support vector machine as the classifier to predict protein subcellular localization. This novel model achieves prediction accuracies of 0.927 and 0.871 respectively for the CL317 and ZW225 datasets, better than other existing methods in the jackknife tests. The results suggest that the GCGR and NSI features are useful complements to popular protein sequence representations in predicting yeast protein subcellular localization. Finally, we validate a few newly predicted protein subcellular localizations by evidences from some published articles in authority journals and books.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Prediction of Multi-site Protein Subcellular Localization
    Zhao, Qing
    Li, Na
    Fang, Li
    [J]. 2020 IEEE CONFERENCE ON TELECOMMUNICATIONS, OPTICS AND COMPUTER SCIENCE (TOCS), 2020, : 339 - 343
  • [22] Multi-view fusion neural network for traffic demand prediction
    Zhang, Dongran
    Li, Jun
    [J]. INFORMATION SCIENCES, 2023, 646
  • [23] circRNA-binding protein site prediction based on multi-view deep learning, subspace learning and multi-view classifier
    Li, Hui
    Deng, Zhaohong
    Yang, Haitao
    Pan, Xiaoyong
    Wei, Zhisheng
    Shen, Hong-Bin
    Choi, Kup-Sze
    Wang, Lei
    Wang, Shitong
    Wu, Jing
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [24] Multi-view Image Fusion
    Comino Trinidad, Marc
    Martin Brualla, Ricardo
    Kainz, Florian
    Kontkanen, Janne
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4100 - 4109
  • [25] MULTI-VIEW SOURCE LOCALIZATION BASED ON POWER RATIOS
    Laufer-Goldshtein, Bracha
    Talmon, Ronen
    Cohen, Israel
    Gannot, Sharon
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 71 - 75
  • [26] A Malware Detection Algorithm Based on Multi-view Fusion
    Guo, Shanqing
    Yuan, Qixia
    Lin, Fengbo
    Wang, Fengyu
    Ban, Tao
    [J]. NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 : 259 - +
  • [27] Attention-based model for dynamic IR drop prediction with multi-view features
    Zhu, Wenhao
    Liu, Wu
    [J]. ELECTRONICS LETTERS, 2023, 59 (13)
  • [28] TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM
    Hu, Jun
    Han, Ke
    Li, Yang
    Yang, Jing-Yu
    Shen, Hong-Bin
    Yu, Dong-Jun
    [J]. AMINO ACIDS, 2016, 48 (11) : 2533 - 2547
  • [29] TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM
    Jun Hu
    Ke Han
    Yang Li
    Jing-Yu Yang
    Hong-Bin Shen
    Dong-Jun Yu
    [J]. Amino Acids, 2016, 48 : 2533 - 2547
  • [30] Protein subcellular localization prediction based on compartment-specific features and structure conservation
    Emily Chia-Yu Su
    Hua-Sheng Chiu
    Allan Lo
    Jenn-Kang Hwang
    Ting-Yi Sung
    Wen-Lian Hsu
    [J]. BMC Bioinformatics, 8