Beegle: from literature mining to disease-gene discovery

被引:19
|
作者
ElShal, Sarah [1 ,2 ]
Tranchevent, Leon-Charles [1 ,2 ,3 ,4 ,5 ]
Sifrim, Alejandro [1 ,2 ,6 ]
Ardeshirdavani, Amin [1 ,2 ]
Davis, Jesse [7 ]
Moreau, Yves [1 ,2 ]
机构
[1] Katholieke Univ Leuven, Dept Elect Engn ESAT STADIUS Ctr Dynam Syst, Signal Proc & Data Analyt Dept, B-3001 Leuven, Belgium
[2] Katholieke Univ Leuven, iMinds Future Hlth Dept, B-3001 Leuven, Belgium
[3] Canc Res Ctr Lyon, INSERM, UMR S1052, CNRS,UMR5286, Lyon, France
[4] Univ Lyon 1, F-69622 Villeurbanne, France
[5] Ctr Leon Berard, F-69373 Lyon, France
[6] Wellcome Trust Sanger Inst, Wellcome Trust Genome Campus, Cambridge CB10 1SA, England
[7] Katholieke Univ Leuven, Dept Comp Sci DTAI, B-3001 Leuven, Belgium
关键词
CANDIDATE GENES; PRIORITIZATION; ASSOCIATION; IDENTIFICATION;
D O I
10.1093/nar/gkv905
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Disease-gene identification is a challenging process that has multiple applications within functional genomics and personalized medicine. Typically, this process involves both finding genes known to be associated with the disease (through literature search) and carrying out preliminary experiments or screens (e.g. linkage or association studies, copy number analyses, expression profiling) to determine a set of promising candidates for experimental validation. This requires extensive time and monetary resources. We describe Beegle, an online search and discovery engine that attempts to simplify this process by automating the typical approaches. It starts by mining the literature to quickly extract a set of genes known to be linked with a given query, then it integrates the learning methodology of Endeav-our (a gene prioritization tool) to train a genomic model and rank a set of candidate genes to generate novel hypotheses. In a realistic evaluation setup, Beegle has an average recall of 84% in the top 100 returned genes as a search engine, which improves the discovery engine by 12.6% in the top 5% prioritized genes. Beegle is publicly available at http://beegle.esat.kuleuven.be/.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Neurocarta: aggregating and sharing disease-gene relations for the neurosciences
    Portales-Casamar, Elodie
    Ch'ng, Carolyn
    Lui, Frances
    St-Georges, Nicolas
    Zoubarev, Anton
    Lai, Artemis Y.
    Lee, Mark
    Kwok, Cathy
    Kwok, Willie
    Tseng, Luchia
    Pavlidis, Paul
    BMC GENOMICS, 2013, 14
  • [32] eGIFT: Mining Gene Information from the Literature
    Tudor, Catalina O.
    Schmidt, Carl J.
    Vijay-Shanker, K.
    BMC BIOINFORMATICS, 2010, 11
  • [33] eGIFT: Mining Gene Information from the Literature
    Catalina O Tudor
    Carl J Schmidt
    K Vijay-Shanker
    BMC Bioinformatics, 11
  • [34] DOE clone resources spur disease-gene discoveries
    不详
    HUMAN GENOME NEWS, 1996, 8 (02) : 7 - 8
  • [35] A genome-wide analysis of admixture in Uyghurs and a high-density admixture map for disease-gene discovery
    Xu, Shuhua
    Jin, Li
    AMERICAN JOURNAL OF HUMAN GENETICS, 2008, 83 (03) : 322 - 336
  • [36] Literature mining for the biologist: from information retrieval to biological discovery
    Jensen, LJ
    Saric, J
    Bork, P
    NATURE REVIEWS GENETICS, 2006, 7 (02) : 119 - 129
  • [37] Literature mining for the biologist: from information retrieval to biological discovery
    Lars Juhl Jensen
    Jasmin Saric
    Peer Bork
    Nature Reviews Genetics, 2006, 7 : 119 - 129
  • [38] Neurocarta: aggregating and sharing disease-gene relations for the neurosciences
    Elodie Portales-Casamar
    Carolyn Ch’ng
    Frances Lui
    Nicolas St-Georges
    Anton Zoubarev
    Artemis Y Lai
    Mark Lee
    Cathy Kwok
    Willie Kwok
    Luchia Tseng
    Paul Pavlidis
    BMC Genomics, 14
  • [39] Whole-genome sequencing and disease-gene detection
    Lynn B Jorde
    BMC Proceedings, 6 (Suppl 6)
  • [40] Relating disease-gene interaction network with disease-associated ncRNAs
    Chen, Hailin
    Zhang, Zuping
    Li, Guanghui
    IEEE ACCESS, 2019, 7 : 133521 - +