Identifying subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm

被引:6
|
作者
Li, Zhan-Chao [1 ]
Lai, Yan-Hua [2 ]
Chen, Li-Li [2 ]
Chen, Chao [3 ]
Xie, Yun [1 ]
Dai, Zong [2 ]
Zou, Xiao-Yong [2 ]
机构
[1] Guangdong Pharmaceut Univ, Sch Chem & Chem Engn, Guangzhou 510006, Guangdong, Peoples R China
[2] Sun Yat Sen Univ, Sch Chem & Chem Engn, Guangzhou 510275, Guangdong, Peoples R China
[3] Guangdong Pharmaceut Univ, Sch Tradit Chinese Med, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
PREDICTION; SINGLE; SITES; PLANT;
D O I
10.1039/c3mb25451h
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In the post-genome era, one of the most important and challenging tasks is to identify the subcellular localizations of protein complexes, and further elucidate their functions in human health with applications to understand disease mechanisms, diagnosis and therapy. Although various experimental approaches have been developed and employed to identify the subcellular localizations of protein complexes, the laboratory technologies fall far behind the rapid accumulation of protein complexes. Therefore, it is highly desirable to develop a computational method to rapidly and reliably identify the subcellular localizations of protein complexes. In this study, a novel method is proposed for predicting subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm. Protein complexes are modeled as weighted graphs containing nodes and edges, where nodes represent proteins, edges represent protein-protein interactions and weights are descriptors of protein primary structures. Some topological structure features are proposed and adopted to characterize protein complexes based on graph theory. Random forest is employed to construct a model and predict subcellular localizations of protein complexes. Accuracies on a training set by a 10-fold cross-validation test for predicting plasma membrane/membrane attached, cytoplasm and nucleus are 84.78%, 71.30%, and 82.00%, respectively. And accuracies for the independent test set are 81.31%, 69.95% and 81.00%, respectively. These high prediction accuracies exhibit the state-of-the-art performance of the current method. It is anticipated that the proposed method may become a useful high-throughput tool and plays a complementary role to the existing experimental techniques in identifying subcellular localizations of mammalian protein complexes. The source code of Matlab and the dataset can be obtained freely on request from the authors.
引用
收藏
页码:658 / 667
页数:10
相关论文
共 50 条
  • [31] A Spectrum Allocation Algorithm Based on Graph Theory
    Xie, Yupeng
    Tan, Xuezhi
    Liu, YuTao
    Ma, Lin
    Wu, Haiyan
    MECHATRONICS AND APPLIED MECHANICS, PTS 1 AND 2, 2012, 157-158 : 1065 - +
  • [32] Identifying protein complexes based on brainstorming strategy
    Shen, Xianjun
    Zhou, Jin
    Yi, Li
    Hu, Xiaohua
    He, Tingting
    Yang, Jincai
    METHODS, 2016, 110 : 44 - 53
  • [33] An Algorithm for Identifying the Abstract Syntax of Graph-Based Diagrams
    Anaby-Tavor, Ateret
    Amid, David
    Fisher, Amit
    Ossher, Harold
    Bellamy, Rachel
    Callery, Matthew
    Desmond, Michael
    Krasikov, Sophia
    Roth, Tova
    Simmonds, Ian
    de Vries, Jacqueline
    2009 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING, PROCEEDINGS, 2009, : 193 - +
  • [34] Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm
    Li, Minjie
    Wu, Ziheng
    Wang, Wenyan
    Lu, Kun
    Zhang, Jun
    Zhou, Yuming
    Chen, Zhaoquan
    Li, Dan
    Zheng, Shicheng
    Chen, Peng
    Wang, Bing
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (06) : 3646 - 3654
  • [35] A algorithm for identifying disease genes by incorporating the subcellular localization information into the protein-protein interaction networks
    Tang, Xiwei
    Hu, Xiaohua
    Yang, Xuejun
    Sun, Yuan
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 308 - 311
  • [36] Transient stability assessment based on random forest algorithm
    Ye, Shengyong
    Wang, Xiaoru
    Liu, Zhigang
    Qian, Qingquan
    Xinan Jiaotong Daxue Xuebao/Journal of Southwest Jiaotong University, 2008, 43 (05): : 573 - 577
  • [37] Random Forest Algorithm Based on Differential Privacy Protection
    Zhang, Yaling
    Feng, Pengfei
    Ning, Yao
    2021 IEEE 20TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2021), 2021, : 1259 - 1264
  • [38] A Scalable Random Forest Algorithm Based on Map Reduce
    Han, Jiawei
    Liu, Yanheng
    Sun, Xin
    PROCEEDINGS OF 2013 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2012, : 849 - 852
  • [39] An Improved Random Forest Algorithm Based on Attribute Compatibility
    Liu, Yu
    Liu, Lu
    Gao, Yin
    Yang, Liu
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2558 - 2561
  • [40] DBRF: Random Forest Optimization Algorithm Based on DBSCAN
    Zhuo, Wang
    Ahmad, Azlin
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (09) : 354 - 362