Identifying subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm

被引:6
|
作者
Li, Zhan-Chao [1 ]
Lai, Yan-Hua [2 ]
Chen, Li-Li [2 ]
Chen, Chao [3 ]
Xie, Yun [1 ]
Dai, Zong [2 ]
Zou, Xiao-Yong [2 ]
机构
[1] Guangdong Pharmaceut Univ, Sch Chem & Chem Engn, Guangzhou 510006, Guangdong, Peoples R China
[2] Sun Yat Sen Univ, Sch Chem & Chem Engn, Guangzhou 510275, Guangdong, Peoples R China
[3] Guangdong Pharmaceut Univ, Sch Tradit Chinese Med, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
PREDICTION; SINGLE; SITES; PLANT;
D O I
10.1039/c3mb25451h
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In the post-genome era, one of the most important and challenging tasks is to identify the subcellular localizations of protein complexes, and further elucidate their functions in human health with applications to understand disease mechanisms, diagnosis and therapy. Although various experimental approaches have been developed and employed to identify the subcellular localizations of protein complexes, the laboratory technologies fall far behind the rapid accumulation of protein complexes. Therefore, it is highly desirable to develop a computational method to rapidly and reliably identify the subcellular localizations of protein complexes. In this study, a novel method is proposed for predicting subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm. Protein complexes are modeled as weighted graphs containing nodes and edges, where nodes represent proteins, edges represent protein-protein interactions and weights are descriptors of protein primary structures. Some topological structure features are proposed and adopted to characterize protein complexes based on graph theory. Random forest is employed to construct a model and predict subcellular localizations of protein complexes. Accuracies on a training set by a 10-fold cross-validation test for predicting plasma membrane/membrane attached, cytoplasm and nucleus are 84.78%, 71.30%, and 82.00%, respectively. And accuracies for the independent test set are 81.31%, 69.95% and 81.00%, respectively. These high prediction accuracies exhibit the state-of-the-art performance of the current method. It is anticipated that the proposed method may become a useful high-throughput tool and plays a complementary role to the existing experimental techniques in identifying subcellular localizations of mammalian protein complexes. The source code of Matlab and the dataset can be obtained freely on request from the authors.
引用
收藏
页码:658 / 667
页数:10
相关论文
共 50 条
  • [21] Space Transformation Based Random Forest Algorithm
    Guan, Xiaoqiang
    Wang, Wenjian
    Pang, Jifang
    Meng, Yinfeng
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (11): : 2485 - 2499
  • [22] An Improved Algorithm based on KNN and Random Forest
    Liang, Jun
    Liu, Qin
    Nie, Nuihua
    Zeng, Biqing
    Zhang, Zanbo
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
  • [23] Predicting the Outer/Inner BetaStrands in Protein Beta Sheets Based on the Random Forest Algorithm
    Tang, Li
    Zhao, Zheng
    Zhang, Lei
    Zhang, Tao
    Gao, Shan
    INTELLIGENT COMPUTING IN BIOINFORMATICS, 2014, 8590 : 1 - 9
  • [24] Identifying protein complexes based on an edge weight algorithm and core-attachment structure
    Rongquan Wang
    Guixia Liu
    Caixia Wang
    BMC Bioinformatics, 20
  • [25] Identifying protein complexes based on an edge weight algorithm and core-attachment structure
    Wang, Rongquan
    Liu, Guixia
    Wang, Caixia
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [26] Identifying Protein Complexes in Protein-Protein Interaction Data Using Graph Convolutional Network
    Zaki, Nazar
    Singh, Harsh
    Mohamed, Elfadil A.
    IEEE ACCESS, 2021, 9 : 123717 - 123726
  • [27] Identifying suicide ideation in mental health application posts: A random forest algorithm
    Moradian, Hoora
    Lau, Mark A.
    Miki, Andrew
    Klonsky, E. David
    Chapman, Alexander L.
    DEATH STUDIES, 2023, 47 (09) : 1044 - 1052
  • [28] Recommendation algorithm based on random walks in a bipartite graph
    Gama, Ricardo
    André, Nuno
    Pereira, César
    Almeida, Luís
    Pinto, Pedro
    RISTI - Revista Iberica de Sistemas e Tecnologias de Informacao, 2011, (08): : 15 - 24
  • [29] Deep Forest-based Prediction of Protein Subcellular Localization
    Zhao, Lingling
    Wang, Junjie
    Nabil, Mahieddine Mohammed
    Zhang, Jun
    CURRENT GENE THERAPY, 2018, 18 (05) : 268 - 274
  • [30] Prediction of Aptamer Protein Interaction Using Random Forest Algorithm
    Manju, N.
    Samiha, C. M.
    Kumar, S. P. Pavan
    Gururaj, H. L.
    Flammini, Francesco
    IEEE ACCESS, 2022, 10 : 49677 - 49687