Identifying functions of protein complexes based on topology similarity with random forest

被引:4
|
作者
Li, Zhan-Chao [1 ]
Lai, Yan-Hua [2 ]
Chen, Li-Li [2 ]
Xie, Yun [1 ]
Dai, Zong [2 ]
Zou, Xiao-Yong [2 ]
机构
[1] Guangdong Pharmaceut Univ, Sch Chem & Chem Engn, Guangzhou 510006, Guangdong, Peoples R China
[2] Sun Yat Sen Univ, Sch Chem & Chem Engn, Guangzhou 510275, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
INTERACTION NETWORKS; COMMUNITY STRUCTURE; PREDICTION; CLASSIFICATION; ALGORITHM; DISCOVERY; RESOURCE;
D O I
10.1039/c3mb70401g
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Elucidating the functions of protein complexes is critical for understanding disease mechanisms, diagnosis and therapy. In this study, based on the concept that protein complexes with similar topology may have similar functions, we firstly model protein complexes as weighted graphs with nodes representing the proteins and edges indicating interaction between proteins. Secondly, we use topology features derived from the graphs to characterize protein complexes based on the graph theory. Finally, we construct a predictor by using random forest and topology features to identify the functions of protein complexes. Effectiveness of the current method is evaluated by identifying the functions of mammalian protein complexes. And then the predictor is also utilized to identify the functions of protein complexes retrieved from human protein-protein interaction networks. We identify some protein complexes with significant roles in the occurrence of tumors, vesicles and retinoblastoma. It is anticipated that the current research has an important impact on pathogenesis and the pharmaceutical industry. The source code of Matlab and the dataset are freely available on request from the authors.
引用
收藏
页码:514 / 525
页数:12
相关论文
共 50 条
  • [1] Identifying subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm
    Li, Zhan-Chao
    Lai, Yan-Hua
    Chen, Li-Li
    Chen, Chao
    Xie, Yun
    Dai, Zong
    Zou, Xiao-Yong
    [J]. MOLECULAR BIOSYSTEMS, 2013, 9 (04) : 658 - 667
  • [2] DISTANCE FUNCTIONS, CRITICAL POINTS, AND THE TOPOLOGY OF RANDOM CECH COMPLEXES
    Bobrowski, Omer
    Adler, Robert J.
    [J]. HOMOLOGY HOMOTOPY AND APPLICATIONS, 2014, 16 (02) : 311 - 344
  • [3] A New Method for Identifying Essential Proteins Based on Network Topology Properties and Protein Complexes
    Qin, Chao
    Sun, Yongqi
    Dong, Yadong
    [J]. PLOS ONE, 2016, 11 (08):
  • [4] A novel method for identifying disease associated protein complexes based on functional similarity protein complex networks
    Duc-Hau Le
    [J]. Algorithms for Molecular Biology, 10
  • [5] A novel method for identifying disease associated protein complexes based on functional similarity protein complex networks
    Le, Duc-Hau
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2015, 10
  • [6] Identifying the topology of protein complexes from affinity purification assays
    Friedel, Caroline C.
    Zimmer, Ralf
    [J]. BIOINFORMATICS, 2009, 25 (16) : 2140 - 2146
  • [7] Similarity based on the importance of common features in random forest
    Chen, Xiao
    Han, Li
    Leng, Meng
    Pan, Xiao
    [J]. International Journal of Performability Engineering, 2019, 15 (04) : 1171 - 1180
  • [8] Identifying commuters based on random forest of smartcard data
    Mei, Zhenyu
    Ding, Wenchao
    Feng, Chi
    Shen, Liting
    [J]. IET INTELLIGENT TRANSPORT SYSTEMS, 2020, 14 (04) : 207 - 212
  • [9] Random forest similarity for protein-protein interaction prediction from multiple sources
    Qi, YJ
    Klein-Seetharaman, J
    Bar-Joseph, Z
    [J]. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2005, 2005, : 531 - 542
  • [10] Topology of random clique complexes
    Kahle, Matthew
    [J]. DISCRETE MATHEMATICS, 2009, 309 (06) : 1658 - 1671