Identifying subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm

被引:6
|
作者
Li, Zhan-Chao [1 ]
Lai, Yan-Hua [2 ]
Chen, Li-Li [2 ]
Chen, Chao [3 ]
Xie, Yun [1 ]
Dai, Zong [2 ]
Zou, Xiao-Yong [2 ]
机构
[1] Guangdong Pharmaceut Univ, Sch Chem & Chem Engn, Guangzhou 510006, Guangdong, Peoples R China
[2] Sun Yat Sen Univ, Sch Chem & Chem Engn, Guangzhou 510275, Guangdong, Peoples R China
[3] Guangdong Pharmaceut Univ, Sch Tradit Chinese Med, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
PREDICTION; SINGLE; SITES; PLANT;
D O I
10.1039/c3mb25451h
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In the post-genome era, one of the most important and challenging tasks is to identify the subcellular localizations of protein complexes, and further elucidate their functions in human health with applications to understand disease mechanisms, diagnosis and therapy. Although various experimental approaches have been developed and employed to identify the subcellular localizations of protein complexes, the laboratory technologies fall far behind the rapid accumulation of protein complexes. Therefore, it is highly desirable to develop a computational method to rapidly and reliably identify the subcellular localizations of protein complexes. In this study, a novel method is proposed for predicting subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm. Protein complexes are modeled as weighted graphs containing nodes and edges, where nodes represent proteins, edges represent protein-protein interactions and weights are descriptors of protein primary structures. Some topological structure features are proposed and adopted to characterize protein complexes based on graph theory. Random forest is employed to construct a model and predict subcellular localizations of protein complexes. Accuracies on a training set by a 10-fold cross-validation test for predicting plasma membrane/membrane attached, cytoplasm and nucleus are 84.78%, 71.30%, and 82.00%, respectively. And accuracies for the independent test set are 81.31%, 69.95% and 81.00%, respectively. These high prediction accuracies exhibit the state-of-the-art performance of the current method. It is anticipated that the proposed method may become a useful high-throughput tool and plays a complementary role to the existing experimental techniques in identifying subcellular localizations of mammalian protein complexes. The source code of Matlab and the dataset can be obtained freely on request from the authors.
引用
收藏
页码:658 / 667
页数:10
相关论文
共 50 条
  • [41] Research on Optimization of Random Forest Algorithm Based on Spark
    Wang, Suzhen
    Zhang, Zhanfeng
    Geng, Shanshan
    Pang, Chaoyi
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (02): : 3721 - 3731
  • [42] A Fast Parallel Random Forest Algorithm Based on Spark
    Yin, Linzi
    Chen, Ken
    Jiang, Zhaohui
    Xu, Xuemei
    APPLIED SCIENCES-BASEL, 2023, 13 (10):
  • [43] Comparison with Recommendation Algorithm Based on Random Forest Model
    Jiang, Yu
    He, Lili
    Gao, Yan
    Wang, Kai
    Hu, Chengquan
    ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2017, 421 : 463 - 470
  • [44] Image Classification Based on Improved Random Forest Algorithm
    Man, Weishi
    Ji, Yuanyuan
    Zhang, Zhiyu
    2018 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2018, : 346 - 350
  • [45] Fracture zone prediction based on random forest algorithm
    He J.
    Wen X.
    Nie W.-L.
    Li L.
    Yang J.
    Shiyou Diqiu Wuli Kantan/Oil Geophysical Prospecting, 2020, 55 (01): : 161 - 166
  • [46] A New Random Forest Algorithm Based on Learning Automata
    Savargiv, Mohammad
    Masoumi, Behrooz
    Keyvanpour, Mohammad Reza
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [47] Protein Subcellular Localization Prediction Model Based on Graph Convolutional Network
    Zhang, Tianhao
    Gu, Jiawei
    Wang, Zeyu
    Wu, Chunguo
    Liang, Yanchun
    Shi, Xiaohu
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2022, 14 (04) : 937 - 946
  • [48] Identifying protein complexes in protein-protein interaction networks by using clique seeds and graph entropy
    Chen, Bolin
    Shi, Jinhong
    Zhang, Shenggui
    Wu, Fang-Xiang
    PROTEOMICS, 2013, 13 (02) : 269 - 277
  • [49] Predicting protein complexes in protein interaction networks using a core-attachment algorithm based on graph communicability
    Ma, Xiaoke
    Gao, Lin
    INFORMATION SCIENCES, 2012, 189 : 233 - 254
  • [50] Protein Subcellular Localization Prediction Model Based on Graph Convolutional Network
    Tianhao Zhang
    Jiawei Gu
    Zeyu Wang
    Chunguo Wu
    Yanchun Liang
    Xiaohu Shi
    Interdisciplinary Sciences: Computational Life Sciences, 2022, 14 : 937 - 946