Exact score distribution computation for ontological similarity searches

被引:11
|
作者
Schulz, Marcel H. [1 ,2 ]
Koehler, Sebastian [3 ,4 ]
Bauer, Sebastian [3 ]
Robinson, Peter N. [1 ,3 ,4 ]
机构
[1] Max Planck Inst Mol Genet, D-14195 Berlin, Germany
[2] Carnegie Mellon Univ, Ray & Stephanie Lane Ctr Computat Biol, Pittsburgh, PA 15213 USA
[3] Charite, Inst Med Genet & Human Genet, D-13353 Berlin, Germany
[4] Charite, Berlin Brandenburg Ctr Regenerat Therapies BCRT, D-13353 Berlin, Germany
来源
BMC BIOINFORMATICS | 2011年 / 12卷
关键词
SEMANTIC SIMILARITY; GENE ONTOLOGY; SEQUENCE; EXPRESSION; TOOL;
D O I
10.1186/1471-2105-12-441
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Semantic similarity searches in ontologies are an important component of many bioinformatic algorithms, e. g., finding functionally related proteins with the Gene Ontology or phenotypically similar diseases with the Human Phenotype Ontology (HPO). We have recently shown that the performance of semantic similarity searches can be improved by ranking results according to the probability of obtaining a given score at random rather than by the scores themselves. However, to date, there are no algorithms for computing the exact distribution of semantic similarity scores, which is necessary for computing the exact P-value of a given score. Results: In this paper we consider the exact computation of score distributions for similarity searches in ontologies, and introduce a simple null hypothesis which can be used to compute a P-value for the statistical significance of similarity scores. We concentrate on measures based on Resnik's definition of ontological similarity. A new algorithm is proposed that collapses subgraphs of the ontology graph and thereby allows fast score distribution computation. The new algorithm is several orders of magnitude faster than the naive approach, as we demonstrate by computing score distributions for similarity searches in the HPO. It is shown that exact P-value calculation improves clinical diagnosis using the HPO compared to approaches based on sampling. Conclusions: The new algorithm enables for the first time exact P-value calculation via exact score distribution computation for ontology similarity searches. The approach is applicable to any ontology for which the annotation-propagation rule holds and can improve any bioinformatic method that makes only use of the raw similarity scores. The algorithm was implemented in Java, supports any ontology in OBO format, and is available for non-commercial and academic usage under: https://compbio.charite.de/svn/hpo/trunk/src/tools/significance/
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Exact score distribution computation for ontological similarity searches
    Marcel H Schulz
    Sebastian Köhler
    Sebastian Bauer
    Peter N Robinson
    [J]. BMC Bioinformatics, 12
  • [2] Exact Score Distribution Computation for Similarity Searches in Ontologies
    Schulz, Marcel H.
    Koehler, Sebastian
    Bauer, Sebastian
    Vingron, Martin
    Robinson, Peter N.
    [J]. ALGORITHMS IN BIOINFORMATICS, PROCEEDINGS, 2009, 5724 : 298 - +
  • [3] Efficient Exact Similarity Searches using Multiple Token Orderings
    Kim, Jongik
    Lee, Hongrae
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 822 - 833
  • [4] Exact computation of max weighted score estimators
    Florios, Kostas
    Skouras, Spyros
    [J]. JOURNAL OF ECONOMETRICS, 2008, 146 (01) : 86 - 91
  • [5] Similarity Score Computation for Minutiae-Based Fingerprint Recognition
    Khanyile, Nontokozo P.
    de Kock, Antonie
    Mathekga, Mmamolatelo E.
    [J]. 2014 IEEE/IAPR INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2014), 2014,
  • [6] Dimensions of ontological similarity
    Szmeja, Pawel
    Ganzha, Maria
    Paprzycki, Marcin
    Pawlowski, Wieslaw
    [J]. 2016 IEEE TENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2016, : 245 - 248
  • [7] Exact distribution of the local score for markovian sequences
    Hassenforder, Claudie
    Mercier, Sabine
    [J]. ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2007, 59 (04) : 741 - 755
  • [8] Exact Distribution of the Local Score for Markovian Sequences
    Claudie Hassenforder
    Sabine Mercier
    [J]. Annals of the Institute of Statistical Mathematics, 2007, 59 : 741 - 755
  • [9] IMPROVING THE SIMILARITY ESTIMATION VIA SCORE DISTRIBUTION
    Liao, Lixin
    Wei, Shikui
    Zhao, Yao
    Gu, Guanghua
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,
  • [10] Exact computation of the distribution of likelihood ratios with forensic applications
    Dorum, Guro
    Bleka, Oyvind
    Gill, Peter
    Haned, Hinda
    Snipen, Lars
    Saebo, Solve
    Egeland, Thore
    [J]. FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2014, 9 : 93 - 101