R3P-Loc: A compact multi-label predictor using ridge regression and random projection for protein subcellular localization

被引:30
|
作者
Wan, Shibiao [1 ]
Mak, Man-Wai [1 ]
Kung, Sun-Yuan [2 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China
[2] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
关键词
Multi-location proteins; Compact databases; Multi-label classification; AMINO-ACID-COMPOSITION; GENE ONTOLOGY; JOHNSON-LINDENSTRAUSS; LEARNING CLASSIFIER; LOCATION; SINGLE; PSEAAC; DATABASE; SITES; PLANT;
D O I
10.1016/j.jtbi.2014.06.031
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Locating proteins within cellular contexts is of paramount significance in elucidating their biological functions. Computational methods based on knowledge databases (such as gene ontology annotation (GOA) database) are known to be more efficient than sequence-based methods. However, the predominant scenarios of knowledge-based methods are that (1) knowledge databases typically have enormous size and are growing exponentially, (2) knowledge databases contain redundant information, and (3) the number of extracted features from knowledge databases is much larger than the number of data samples with ground-truth labels. These properties render the extracted features liable to redundant or irrelevant information, causing the prediction systems suffer from overfitting. To address these problems, this paper proposes an efficient multi-label predictor, namely R3P-Loc, which uses two compact databases for feature extraction and applies random projection (RP) to reduce the feature dimensions of an ensemble ridge regression (RR) classifier. Two new compact databases are created from Swiss-Prot and GOA databases. These databases possess almost the same amount of information as their full-size counterparts but with much smaller size. Experimental results on two recent datasets (eukaryote and plant) suggest that R3P-Loc can reduce the dimensions by seven-folds and significantly outperforms state-of-the-art predictors. This paper also demonstrates that the compact databases reduce the memory consumption by 39 times without causing degradation in prediction accuracy. For readers' convenience, the R3P-Loc server is available online at url:http://bioinfo.eie.polyu.edu.hk/ R3PLocServer/. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:34 / 45
页数:12
相关论文
共 11 条
  • [1] ENSEMBLE RANDOM PROJECTION FOR MULTI-LABEL CLASSIFICATION WITH APPLICATION TO PROTEIN SUBCELLULAR LOCALIZATION
    Wan, Shibiao
    Mak, Man-Wai
    Zhang, Bai
    Wang, Yue
    Kung, Sun-Yuan
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] An Ensemble Classifier with Random Projection for Predicting Multi-label Protein Subcellular Localization
    Wan, Shibiao
    Mak, Man-Wai
    Zhang, Bai
    Wang, Yue
    Kung, Sun-Yuan
    2013 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2013,
  • [3] mPLR-Loc: An adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction
    Wan, Shibiao
    Mak, Man-Wai
    Kung, Sun-Yuan
    ANALYTICAL BIOCHEMISTRY, 2015, 473 : 14 - 27
  • [4] DeepLoc 2.0: multi-label subcellular localization prediction using protein language models
    Thumuluri, Vineet
    Armenteros, Jose Juan Almagro
    Johansen, Alexander Rosenberg
    Nielsen, Henrik
    Winther, Ole
    NUCLEIC ACIDS RESEARCH, 2022, 50 (W1) : W228 - W234
  • [5] Tissue-Specific Subcellular Localization Prediction Using Multi-Label Markov Random Fields
    Zhu, Lu
    Hofestaedt, Ralf
    Ester, Martin
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (05) : 1471 - 1482
  • [6] Predicting Protein Subcellular Localization with Multi-label using GraphSAGE and Multi-head Attention Mechanism
    Liang, Qianle
    Qiu, Wenjing
    Lin, Weizhong
    PROCEEDINGS OF 2024 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND INTELLIGENT COMPUTING, BIC 2024, 2024, : 414 - 419
  • [7] An image-based multi-label human protein subcellular localization predictor (iLocator) reveals protein mislocalizations in cancer tissues
    Xu, Ying-Ying
    Yang, Fan
    Zhang, Yang
    Shen, Hong-Bin
    BIOINFORMATICS, 2013, 29 (16) : 2032 - 2040
  • [8] DM3Loc: multi-label mRNA subcellular localization prediction and analysis based on multi-head self-attention mechanism
    Wang, Duolin
    Zhang, Zhaoyue
    Jiang, Yuexu
    Mao, Ziting
    Wang, Dong
    Lin, Hao
    Xu, Dong
    NUCLEIC ACIDS RESEARCH, 2021, 49 (08) : E46
  • [9] DMLDA-LocLIFT: Identification of multi-label protein subcellular localization using DMLDA dimensionality reduction and LIFT classifier
    Zhang, Qi
    Li, Shan
    Yu, Bin
    Zhang, Qingmei
    Han, Yu
    Zhang, Yan
    Ma, Qin
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2020, 206
  • [10] Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble
    Wang, Xiao
    Zhang, Jun
    Li, Guo-Zheng
    BMC BIOINFORMATICS, 2015, 16