A systematic study of genome context methods: calibration, normalization and combination

被引:16
|
作者
Ferrer, Luciana [1 ]
Dale, Joseph M. [1 ]
Karp, Peter D. [1 ]
机构
[1] SRI Int, Ctr Artificial Intelligence, Menlo Pk, CA 94025 USA
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
PROTEIN-PROTEIN INTERACTIONS; BIOCYC COLLECTION; FUNCTIONAL LINKAGES; METABOLIC PATHWAYS; ESCHERICHIA-COLI; METACYC DATABASE; PREDICTION; ENZYMES; NETWORKS;
D O I
10.1186/1471-2105-11-493
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Genome context methods have been introduced in the last decade as automatic methods to predict functional relatedness between genes in a target genome using the patterns of existence and relative locations of the homologs of those genes in a set of reference genomes. Much work has been done in the application of these methods to different bioinformatics tasks, but few papers present a systematic study of the methods and their combination necessary for their optimal use. Results: We present a thorough study of the four main families of genome context methods found in the literature: phylogenetic profile, gene fusion, gene cluster, and gene neighbor. We find that for most organisms the gene neighbor method outperforms the phylogenetic profile method by as much as 40% in sensitivity, being competitive with the gene cluster method at low sensitivities. Gene fusion is generally the worst performing of the four methods. A thorough exploration of the parameter space for each method is performed and results across different target organisms are presented. We propose the use of normalization procedures as those used on microarray data for the genome context scores. We show that substantial gains can be achieved from the use of a simple normalization technique. In particular, the sensitivity of the phylogenetic profile method is improved by around 25% after normalization, resulting, to our knowledge, on the best-performing phylogenetic profile system in the literature. Finally, we show results from combining the various genome context methods into a single score. When using a cross-validation procedure to train the combiners, with both original and normalized scores as input, a decision tree combiner results in gains of up to 20% with respect to the gene neighbor method. Overall, this represents a gain of around 15% over what can be considered the state of the art in this area: the four original genome context methods combined using a procedure like that used in the STRING database. Unfortunately, we find that these gains disappear when the combiner is trained only with organisms that are phylogenetically distant from the target organism. Conclusions: Our experiments indicate that gene neighbor is the best individual genome context method and that gains from the combination of individual methods are very sensitive to the training data used to obtain the combiner's parameters. If adequate training data is not available, using the gene neighbor score by itself instead of a combined score might be the best choice.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] A systematic study of genome context methods: calibration, normalization and combination
    Luciana Ferrer
    Joseph M Dale
    Peter D Karp
    BMC Bioinformatics, 11
  • [2] An Introduction to Normalization and Calibration Methods in Functional MRI
    Liu, Thomas T.
    Glover, Gary H.
    Mueller, Bryon A.
    Greve, Douglas N.
    Brown, Gregory G.
    PSYCHOMETRIKA, 2013, 78 (02) : 308 - 321
  • [3] An Introduction to Normalization and Calibration Methods in Functional MRI
    Thomas T. Liu
    Gary H. Glover
    Bryon A. Mueller
    Douglas N. Greve
    Gregory G. Brown
    Psychometrika, 2013, 78 : 308 - 321
  • [4] Systematic evaluation of calibration methods
    Bringmann, B.
    Besuchet, J. P.
    Rohr, L.
    CIRP ANNALS-MANUFACTURING TECHNOLOGY, 2008, 57 (01) : 529 - 532
  • [5] A systematic comparison of normalization methods for eQTL analysis
    Yang, Jiajun
    Wang, Dongyang
    Yang, Yanbo
    Yang, Wenqian
    Jin, Weiwei
    Niu, Xiaohui
    Gong, Jing
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [6] A systematic study of normalization methods for Infinium 450K methylation data using whole-genome bisulfite sequencing data
    Wang, Ting
    Guan, Weihua
    Lin, Jerome
    Boutaoui, Nadia
    Canino, Glorisa
    Luo, Jianhua
    Celedon, Juan Carlos
    Chen, Wei
    EPIGENETICS, 2015, 10 (07) : 662 - 669
  • [7] VACUUM GAUGE CALIBRATION BY A COMBINATION OF EXPANSION AND PUMPDOWN METHODS
    HUANG, ZB
    ZHOU, XO
    FAN, JW
    JOURNAL OF VACUUM SCIENCE & TECHNOLOGY A-VACUUM SURFACES AND FILMS, 1987, 5 (04): : 2380 - 2383
  • [8] METHODS FOR HYPERSPECTRAL MICROSCOPE CALIBRATION AND SPECTRA NORMALIZATION FROM IMAGES OF BACTERIA CELLS
    Eady, M. B.
    Park, B.
    Yoon, S. C.
    Haidekker, M. A.
    Lawrence, K. C.
    TRANSACTIONS OF THE ASABE, 2018, 61 (02) : 437 - 448
  • [9] A systematic review of coastal vulnerability assessment methods in the Egyptian context: the case study
    Hemida A.A.
    Khalifa M.
    Abdelsalheen M.
    Afifi S.
    HBRC Journal, 2023, 19 (01) : 161 - 181
  • [10] Calibration in context: Using beer analysis to introduce standardization methods
    Hunter, Rebecca
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 257