Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multilabel learning based on gene ontology annotation and profile alignment

被引:4
|
作者
Bouziane, Hafida [1 ]
Chouarfia, Abdallah [1 ]
机构
[1] Univ Sci & Technol Oran Mohamed Boudiaf, Dept Informat, USTO MB BP 1505, El Mnaouer 31000, Oran, Algeria
关键词
gene ontology terms; gram-negative bacteria; gram-positive bacteria; multi-label learning; profile alignment; subcellular localization prediction; SUPPORT VECTOR MACHINES; AMINO-ACID-COMPOSITION; WEB SERVER; FUNCTIONAL ANNOTATION; NEURAL-NETWORKS; LOCATION; TOOL; CLASSIFIER; SYSTEM; MODES;
D O I
10.1515/jib-2019-0091
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
To date, many proteins generated by large-scale genome sequencing projects are still uncharacterized and subject to intensive investigations by both experimental and computational means. Knowledge of protein subcellular localization (SCL) is of key importance for protein function elucidation. However, it remains a challenging task, especially for multiple sites proteins known to shuttle between cell compartments to perform their proper biological functions and proteins which do not have significant homology to proteins of known subcellular locations. Due to their low-cost and reasonable accuracy, machine learning-based methods have gained much attention in this context with the availability of a plethora of biological databases and annotated proteins for analysis and benchmarking. Various predictive models have been proposed to tackle the SCL problem, using different protein sequence features pertaining to the subcellular localization, however, the overwhelming majority of them focuses on single localization and cover very limited cellular locations. The prediction was basically established on sorting signals, amino acids compositions, and homology. To improve the prediction quality, focus is actually on knowledge information extracted from annotation databases, such as protein-protein interactions and Gene Ontology (GO) functional domains annotation which has been recently a widely adopted and essential information for learning systems. To deal with such problem, in the present study, we considered SCL prediction task as a multi-label learning problem and tried to label both single site and multiple sites unannotated bacterial protein sequences by mining proteins homology relationships using both GO terms of protein homologs and PSI-BLAST profiles. The experiments using 5-fold cross-validation tests on the benchmark datasets showed a significant improvement on the results obtained by the proposed consensus multi-label prediction model which discriminates six compartments for Gramnegative and five compartments for Gram-positive bacterial proteins.
引用
收藏
页码:51 / 79
页数:29
相关论文
共 8 条
  • [1] Predict Gram-Positive and Gram-Negative Subcellular Localization via Incorporating Evolutionary Information and Physicochemical Features Into Chou's General PseAAC
    Sharma, Ronesh
    Dehzangi, Abdollah
    Lyons, James
    Paliwal, Kuldip
    Tsunoda, Tatsuhiko
    Sharma, Alok
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2015, 14 (08) : 915 - 926
  • [2] Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou's general PseAAC
    Dehzangi, Abdollah
    Heffernan, Rhys
    Sharma, Alok
    Lyons, James
    Paliwal, Kuldip
    Sattar, Abdul
    JOURNAL OF THEORETICAL BIOLOGY, 2015, 364 : 284 - 294
  • [3] Gram-LocEN: Interpretable prediction of subcellular multi-localization of Gram-positive and Gram-negative bacterial proteins
    Wan, Shibiao
    Mak, Man-Wai
    Kung, Sun-Yuan
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2017, 162 : 1 - 9
  • [4] pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC
    Cheng, Xiang
    Xiao, Xuan
    Chou, Kuo-Chen
    GENOMICS, 2018, 110 (04) : 231 - 239
  • [5] Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble
    Wang, Xiao
    Zhang, Jun
    Li, Guo-Zheng
    BMC BIOINFORMATICS, 2015, 16
  • [6] Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble
    Xiao Wang
    Jun Zhang
    Guo-Zheng Li
    BMC Bioinformatics, 16
  • [7] Prediction subcellular localization of Gram-negative bacterial proteins by support vector machine using wavelet denoising and Chou's pseudo amino acid composition
    Yu, Bin
    Li, Shan
    Chen, Cheng
    Xu, Jiameng
    Qiu, Wenying
    Wu, Xue
    Chen, Ruixin
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2017, 167 : 102 - 112
  • [8] predMultiLoc-Gneg: Predicting Subcellular Localization of Gram-Negative Bacterial Proteins Using Feature Selection in Gene Ontology Space and Resolving the Data Imbalance Issue
    Hasan, Md. Al Mehedi
    Ahmad, Shamim
    Mondal, Md. Nazrul Islam
    Ahmed, Boshir
    2017 IEEE REGION 10 HUMANITARIAN TECHNOLOGY CONFERENCE (R10-HTC), 2017, : 109 - 112