Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features

被引：18

作者：

Li, Bo ^{[1
]}

Cai, Lijun ^{[1
]}

Liao, Bo ^{[1
,2
]}

Fu, Xiangzheng ^{[1
]}

Bing, Pingping ^{[3
]}

Yang, Jialiang ^{[2
]}

机构：

[1] Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China

[2] Hainan Normal Univ, Sch Math & Stat, Haikou 570100, Hainan, Peoples R China

[3] Changsha Med Univ, Acad Working Stn, Changsha 410219, Hunan, Peoples R China

来源：

MOLECULES | 2019年 / 24卷 / 05期

关键词：

protein subcellular localization; protein primary sequence; generalized chaos game representation; statistical method; support vector machine; unitary distance; FEATURE-EXTRACTION; LOCATION;

D O I：

10.3390/molecules24050919

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

The prediction of protein subcellular localization is critical for inferring protein functions, gene regulations and protein-protein interactions. With the advances of high-throughput sequencing technologies and proteomic methods, the protein sequences of numerous yeasts have become publicly available, which enables us to computationally predict yeast protein subcellular localization. However, widely-used protein sequence representation techniques, such as amino acid composition and the Chou's pseudo amino acid composition (PseAAC), are difficult in extracting adequate information about the interactions between residues and position distribution of each residue. Therefore, it is still urgent to develop novel sequence representations. In this study, we have presented two novel protein sequence representation techniques including Generalized Chaos Game Representation (GCGR) based on the frequency and distributions of the residues in the protein primary sequence, and novel statistics and information theory (NSI) reflecting local position information of the sequence. In the GCGR + NSI representation, a protein primary sequence is simply represented by a 5-dimensional feature vector, while other popular methods like PseAAC and dipeptide adopt features of more than hundreds of dimensions. In practice, the feature representation is highly efficient in predicting protein subcellular localization. Even without using machine learning-based classifiers, a simple model based on the feature vector can achieve prediction accuracies of 0.8825 and 0.7736 respectively for the CL317 and ZW225 datasets. To further evaluate the effectiveness of the proposed encoding schemes, we introduce a multi-view features-based method to combine the two above-mentioned features with other well-known features including PseAAC and dipeptide composition, and use support vector machine as the classifier to predict protein subcellular localization. This novel model achieves prediction accuracies of 0.927 and 0.871 respectively for the CL317 and ZW225 datasets, better than other existing methods in the jackknife tests. The results suggest that the GCGR and NSI features are useful complements to popular protein sequence representations in predicting yeast protein subcellular localization. Finally, we validate a few newly predicted protein subcellular localizations by evidences from some published articles in authority journals and books.

引用

页数：13

共 50 条

[21] Prediction of Multi-site Protein Subcellular Localization
Zhao, Qing
Li, Na
Fang, Li
[J]. 2020 IEEE CONFERENCE ON TELECOMMUNICATIONS, OPTICS AND COMPUTER SCIENCE (TOCS), 2020, : 339 - 343
[22] Multi-view fusion neural network for traffic demand prediction
Zhang, Dongran
Li, Jun
[J]. INFORMATION SCIENCES, 2023, 646
[23] circRNA-binding protein site prediction based on multi-view deep learning, subspace learning and multi-view classifier
Li, Hui
Deng, Zhaohong
Yang, Haitao
Pan, Xiaoyong
Wei, Zhisheng
Shen, Hong-Bin
Choi, Kup-Sze
Wang, Lei
Wang, Shitong
Wu, Jing
[J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
[24] Multi-view Image Fusion
Comino Trinidad, Marc
Martin Brualla, Ricardo
Kainz, Florian
Kontkanen, Janne
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4100 - 4109
[25] MULTI-VIEW SOURCE LOCALIZATION BASED ON POWER RATIOS
Laufer-Goldshtein, Bracha
Talmon, Ronen
Cohen, Israel
Gannot, Sharon
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 71 - 75
[26] A Malware Detection Algorithm Based on Multi-view Fusion
Guo, Shanqing
Yuan, Qixia
Lin, Fengbo
Wang, Fengyu
Ban, Tao
[J]. NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 : 259 - +
[27] Attention-based model for dynamic IR drop prediction with multi-view features
Zhu, Wenhao
Liu, Wu
[J]. ELECTRONICS LETTERS, 2023, 59 (13)
[28] TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM
Hu, Jun
Han, Ke
Li, Yang
Yang, Jing-Yu
Shen, Hong-Bin
Yu, Dong-Jun
[J]. AMINO ACIDS, 2016, 48 (11) : 2533 - 2547
[29] TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM
Jun Hu
Ke Han
Yang Li
Jing-Yu Yang
Hong-Bin Shen
Dong-Jun Yu
[J]. Amino Acids, 2016, 48 : 2533 - 2547
[30] Protein subcellular localization prediction based on compartment-specific features and structure conservation
Emily Chia-Yu Su
Hua-Sheng Chiu
Allan Lo
Jenn-Kang Hwang
Ting-Yi Sung
Wen-Lian Hsu
[J]. BMC Bioinformatics, 8

← 1 2 3 4 5 →