Effect of Reference Genome Selection on the Performance of Computational Methods for Genome-Wide Protein-Protein Interaction Prediction

被引:18
|
作者
Muley, Vijaykumar Yogesh [1 ,2 ]
Ranjan, Akash [1 ]
机构
[1] Ctr DNA Fingerprinting & Diagnost, Computat & Funct Genom Grp, Hyderabad, Andhra Pradesh, India
[2] Dr Babasaheb Ambedkar Marathwada Univ, Dept Biotechnol, Subctr, Osmanabad, Maharashtra, India
来源
PLOS ONE | 2012年 / 7卷 / 07期
关键词
ESCHERICHIA-COLI; FUNCTIONAL LINKAGES; PHYLOGENETIC PROFILES; CONTEXT METHODS; GENE ORDER; NETWORKS; DATABASE; COEVOLUTION; EVOLUTION; CONSERVATION;
D O I
10.1371/journal.pone.0042057
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. Methods: We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Conclusions: Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Minimalist ensemble algorithms for genome-wide protein localization prediction
    Jhih-Rong Lin
    Ananda Mohan Mondal
    Rong Liu
    Jianjun Hu
    BMC Bioinformatics, 13
  • [42] Structure-based prediction of protein-protein interactions on a genome-wide scale (vol 490, pg 556, 2012)
    Zhang, Qiangfeng Cliff
    Petrey, Donald
    Deng, Lei
    Qiang, Li
    Sin, Yu
    Thu, Chan Aye
    Bisikirska, Brygida
    Lefebvre, Celine
    Accili, Domenico
    Hunter, Tony
    Maniatis, Tom
    Califano, Andrea
    Honig, Barry
    NATURE, 2013, 495 (7439) : 127 - 127
  • [43] Genome-wide analyses of member identification, expression pattern, and protein-protein interaction of EPF/EPFL gene family in Gossypium
    Li, Pengtao
    Zhao, Zilin
    Wang, Wenkui
    Wang, Tao
    Hu, Nan
    Wei, Yangyang
    Sun, Zhihao
    Chen, Yu
    Li, Yanfang
    Liu, Qiankun
    Yang, Shuhan
    Gong, Juwu
    Xiao, Xianghui
    Liu, Yuling
    Shi, Yuzhen
    Peng, Renhai
    Lu, Quanwei
    Yuan, Youlu
    BMC PLANT BIOLOGY, 2024, 24 (01):
  • [44] Expert Knowledge from Protein-Protein Interaction Databases to Guide Genome-Wide Genetic Analysis of Common Human Diseases
    Pattin, Kristine A.
    Gui, Jiang
    Moore, Jason
    GENETIC EPIDEMIOLOGY, 2009, 33 (08) : 806 - 806
  • [45] Enzymatic methods for genome-wide profiling of protein binding sites
    Policastro, Robert A.
    Zentner, Gabriel E.
    BRIEFINGS IN FUNCTIONAL GENOMICS, 2018, 17 (02) : 138 - 145
  • [46] Experimental and computational procedures for the assessment of protein complexes on a genome-wide scale
    Musso, Gabriel A.
    Zhang, Zhaolei
    Emili, Andrew
    CHEMICAL REVIEWS, 2007, 107 (08) : 3585 - 3600
  • [47] Fast and accurate genome-wide predictions and structural modeling of protein-protein interactions using Galaxy
    Guerler, Aysam
    Baker, Dannon
    van den Beek, Marius
    Gruening, Bjoern
    Bouvier, Dave
    Coraor, Nate
    Shank, Stephen D. D.
    Zehr, Jordan D. D.
    Schatz, Michael C. C.
    Nekrutenko, Anton
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [48] EpICS: A System for Genome-wide Epistasis and Genetic Variation Analysis using Protein-Protein Interactions
    Sultana, Kazi Zakia
    Bhattacharjee, Anupam
    Jamil, Hasan
    BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 256 - 261
  • [49] Improving Protein-protein Interaction Prediction by Incorporating 3D Genome Information
    Guo, Zehua
    Su, Kai
    Liu, Liangjie
    Su, Xianbin
    Feng, Mofan
    Cao, Song
    Zhang, Mingxuan
    Chi, Runqiu
    Meng, Luming
    He, Guang
    Shi, Yi
    BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2021, 2021, 13064 : 511 - 520
  • [50] A survey of computational methods in protein-protein interaction networks
    Rasti, Saeid
    Vogiatzis, Chrysafis
    ANNALS OF OPERATIONS RESEARCH, 2019, 276 (1-2) : 35 - 87