Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features

被引:18
|
作者
Li, Bo [1 ]
Cai, Lijun [1 ]
Liao, Bo [1 ,2 ]
Fu, Xiangzheng [1 ]
Bing, Pingping [3 ]
Yang, Jialiang [2 ]
机构
[1] Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China
[2] Hainan Normal Univ, Sch Math & Stat, Haikou 570100, Hainan, Peoples R China
[3] Changsha Med Univ, Acad Working Stn, Changsha 410219, Hunan, Peoples R China
关键词
protein subcellular localization; protein primary sequence; generalized chaos game representation; statistical method; support vector machine; unitary distance; FEATURE-EXTRACTION; LOCATION;
D O I
10.3390/molecules24050919
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The prediction of protein subcellular localization is critical for inferring protein functions, gene regulations and protein-protein interactions. With the advances of high-throughput sequencing technologies and proteomic methods, the protein sequences of numerous yeasts have become publicly available, which enables us to computationally predict yeast protein subcellular localization. However, widely-used protein sequence representation techniques, such as amino acid composition and the Chou's pseudo amino acid composition (PseAAC), are difficult in extracting adequate information about the interactions between residues and position distribution of each residue. Therefore, it is still urgent to develop novel sequence representations. In this study, we have presented two novel protein sequence representation techniques including Generalized Chaos Game Representation (GCGR) based on the frequency and distributions of the residues in the protein primary sequence, and novel statistics and information theory (NSI) reflecting local position information of the sequence. In the GCGR + NSI representation, a protein primary sequence is simply represented by a 5-dimensional feature vector, while other popular methods like PseAAC and dipeptide adopt features of more than hundreds of dimensions. In practice, the feature representation is highly efficient in predicting protein subcellular localization. Even without using machine learning-based classifiers, a simple model based on the feature vector can achieve prediction accuracies of 0.8825 and 0.7736 respectively for the CL317 and ZW225 datasets. To further evaluate the effectiveness of the proposed encoding schemes, we introduce a multi-view features-based method to combine the two above-mentioned features with other well-known features including PseAAC and dipeptide composition, and use support vector machine as the classifier to predict protein subcellular localization. This novel model achieves prediction accuracies of 0.927 and 0.871 respectively for the CL317 and ZW225 datasets, better than other existing methods in the jackknife tests. The results suggest that the GCGR and NSI features are useful complements to popular protein sequence representations in predicting yeast protein subcellular localization. Finally, we validate a few newly predicted protein subcellular localizations by evidences from some published articles in authority journals and books.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Protein subcellular localization prediction based on compartment-specific features and structure conservation
    Su, Emily Chia-Yu
    Chiu, Hua-Sheng
    Lo, Allan
    Hwang, Jenn-Kang
    Sung, Ting-Yi
    Hsu, Wen-Lian
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [32] A semi-fragile watermarking tamper localization method based on QDFT and multi-view fusion
    Junlin Ouyang
    Jingtao Huang
    Xingzi Wen
    Zhuhong Shao
    [J]. Multimedia Tools and Applications, 2023, 82 : 15113 - 15141
  • [33] Joint activity recognition and indoor localization with WiFi sensing based on multi-view fusion strategy
    Yan, BeiMing
    Cheng, Wei
    Li, Yong
    Gao, Xiang
    Liu, HuiMin
    [J]. DIGITAL SIGNAL PROCESSING, 2022, 129
  • [34] Multi-View Spectral Clustering Based on Multi-Smooth Representation Fusion for Cancer Subtype Prediction
    Liu, Jian
    Ge, Shuguang
    Cheng, Yuhu
    Wang, Xuesong
    [J]. FRONTIERS IN GENETICS, 2021, 12
  • [35] A semi-fragile watermarking tamper localization method based on QDFT and multi-view fusion
    Ouyang, Junlin
    Huang, Jingtao
    Wen, Xingzi
    Shao, Zhuhong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (10) : 15113 - 15141
  • [36] Multi-View Robust Collaborative Localization in High Outlier Ratio Scenes Based on Semantic Features
    Tang, Yujie
    Wang, Meiling
    Deng, Yinan
    Yang, Yi
    Lan, Ziquan
    Yue, Yufeng
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 11042 - 11047
  • [37] Video-based face texture representation and recognition with fusion features from multi-view
    College of Computer Science and Technology, Jilin University, Changchun
    Jilin
    130012, China
    不详
    Jilin
    130012, China
    不详
    130012, China
    不详
    130012, China
    [J]. Jilin Daxue Xuebao (Gongxueban), 6 (1954-1960):
  • [38] Prediction of protein subcellular localization
    Yu, Chin-Sheng
    Chen, Yu-Ching
    Lu, Chih-Hao
    Hwang, Jenn-Kang
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2006, 64 (03) : 643 - 651
  • [39] A multi-view fusion lightweight network for CRSwNPs prediction on CT images
    Zou, Jisheng
    Lyu, Yi
    Lin, Yu
    Chen, Yaowen
    Lai, Shixin
    Wang, Siqi
    Zhang, Xuan
    Zhang, Xiaolei
    Wu, Renhua
    Kang, Weipiao
    [J]. BMC MEDICAL IMAGING, 2024, 24 (01):
  • [40] MULTI-VIEW LEARNING BASED ON NON-REDUNDANT FUSION FOR ICU PATIENT MORTALITY PREDICTION
    Wang, Yifan
    Lan, Ying
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1321 - 1325