Identifying subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm

被引：6

作者：

Li, Zhan-Chao ^{[1
]}

Lai, Yan-Hua ^{[2
]}

Chen, Li-Li ^{[2
]}

Chen, Chao ^{[3
]}

Xie, Yun ^{[1
]}

Dai, Zong ^{[2
]}

Zou, Xiao-Yong ^{[2
]}

机构：

[1] Guangdong Pharmaceut Univ, Sch Chem & Chem Engn, Guangzhou 510006, Guangdong, Peoples R China

[2] Sun Yat Sen Univ, Sch Chem & Chem Engn, Guangzhou 510275, Guangdong, Peoples R China

[3] Guangdong Pharmaceut Univ, Sch Tradit Chinese Med, Guangzhou 510006, Guangdong, Peoples R China

来源：

MOLECULAR BIOSYSTEMS | 2013年 / 9卷 / 04期

基金：

中国国家自然科学基金;

关键词：

PREDICTION; SINGLE; SITES; PLANT;

D O I：

10.1039/c3mb25451h

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

In the post-genome era, one of the most important and challenging tasks is to identify the subcellular localizations of protein complexes, and further elucidate their functions in human health with applications to understand disease mechanisms, diagnosis and therapy. Although various experimental approaches have been developed and employed to identify the subcellular localizations of protein complexes, the laboratory technologies fall far behind the rapid accumulation of protein complexes. Therefore, it is highly desirable to develop a computational method to rapidly and reliably identify the subcellular localizations of protein complexes. In this study, a novel method is proposed for predicting subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm. Protein complexes are modeled as weighted graphs containing nodes and edges, where nodes represent proteins, edges represent protein-protein interactions and weights are descriptors of protein primary structures. Some topological structure features are proposed and adopted to characterize protein complexes based on graph theory. Random forest is employed to construct a model and predict subcellular localizations of protein complexes. Accuracies on a training set by a 10-fold cross-validation test for predicting plasma membrane/membrane attached, cytoplasm and nucleus are 84.78%, 71.30%, and 82.00%, respectively. And accuracies for the independent test set are 81.31%, 69.95% and 81.00%, respectively. These high prediction accuracies exhibit the state-of-the-art performance of the current method. It is anticipated that the proposed method may become a useful high-throughput tool and plays a complementary role to the existing experimental techniques in identifying subcellular localizations of mammalian protein complexes. The source code of Matlab and the dataset can be obtained freely on request from the authors.

引用

页码：658 / 667

页数：10

共 50 条

[1] Identifying functions of protein complexes based on topology similarity with random forest
Li, Zhan-Chao
Lai, Yan-Hua
Chen, Li-Li
Xie, Yun
Dai, Zong
Zou, Xiao-Yong
MOLECULAR BIOSYSTEMS, 2014, 10 (03) : 514 - 525
[2] Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features
Tian, Leqi
Wu, Wenbin
Yu, Tianwei
BIOMOLECULES, 2023, 13 (07)
[3] Scheduling Algorithm Based on Logistics Random Graph Theory
Li, Jing
Peng, Haiyun
INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (07): : 243 - 254
[4] Feature selection algorithm based on graph theory and random forests for protein secondary structure prediction
Altun, Gulsah
Hu, Hae-Jin
Gremalschi, Stefan
Harrison, Robert W.
Pan, Yi
BIOINFORMATICS RESEARCH AND APPLICATIONS, PROCEEDINGS, 2007, 4463 : 590 - +
[5] An improved graph entropy-based method for identifying protein complexes
Chen, Bolin
Yan, Yan
Shi, Jinhong
Zhang, Shenggui
Wu, Fang-Xiang
2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, : 123 - 126
[6] An algorithm for identifying protein complexes based on maximal clique extension
Li, Min
Wang, Jian-Xin
Liu, Bin-Bin
Chen, Jian-Er
Zhongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Central South University (Science and Technology), 2010, 41 (02): : 560 - 565
[7] On protein complexes identifying algorithm based on the novel modularity function
Guo, Maozu
Dai, Qiguo
Xu, Liqiu
Liu, Xiaoyan
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2014, 51 (10): : 2178 - 2186
[8] Advances in spatial proteomics: Mapping proteome architecture from protein complexes to subcellular localizations
Breckels, Lisa M.
Hutchings, Charlotte
Ingole, Kishor D.
Kim, Suyeon
Lilley, Kathryn S.
Makwana, Mehul V.
Mccaskie, Kieran J. A.
Villanueva, Eneko
CELL CHEMICAL BIOLOGY, 2024, 31 (09) : 1665 - 1687
[9] Mining of Protein Subcellular Localizations based on a Syntactic Dependency Tree and WordNet
Kim, Mi-Young
KNOWLEDGE-BASED SOFTWARE ENGINEERING, 2008, 180 : 373 - +
[10] Identification of protein complexes algorithm based on random walk model
Dong Xuantong
Lin Zhijie
Ren Yuan
2014 2ND INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2014, : 383 - 388

← 1 2 3 4 5 →