Beegle: from literature mining to disease-gene discovery

被引:19
|
作者
ElShal, Sarah [1 ,2 ]
Tranchevent, Leon-Charles [1 ,2 ,3 ,4 ,5 ]
Sifrim, Alejandro [1 ,2 ,6 ]
Ardeshirdavani, Amin [1 ,2 ]
Davis, Jesse [7 ]
Moreau, Yves [1 ,2 ]
机构
[1] Katholieke Univ Leuven, Dept Elect Engn ESAT STADIUS Ctr Dynam Syst, Signal Proc & Data Analyt Dept, B-3001 Leuven, Belgium
[2] Katholieke Univ Leuven, iMinds Future Hlth Dept, B-3001 Leuven, Belgium
[3] Canc Res Ctr Lyon, INSERM, UMR S1052, CNRS,UMR5286, Lyon, France
[4] Univ Lyon 1, F-69622 Villeurbanne, France
[5] Ctr Leon Berard, F-69373 Lyon, France
[6] Wellcome Trust Sanger Inst, Wellcome Trust Genome Campus, Cambridge CB10 1SA, England
[7] Katholieke Univ Leuven, Dept Comp Sci DTAI, B-3001 Leuven, Belgium
关键词
CANDIDATE GENES; PRIORITIZATION; ASSOCIATION; IDENTIFICATION;
D O I
10.1093/nar/gkv905
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Disease-gene identification is a challenging process that has multiple applications within functional genomics and personalized medicine. Typically, this process involves both finding genes known to be associated with the disease (through literature search) and carrying out preliminary experiments or screens (e.g. linkage or association studies, copy number analyses, expression profiling) to determine a set of promising candidates for experimental validation. This requires extensive time and monetary resources. We describe Beegle, an online search and discovery engine that attempts to simplify this process by automating the typical approaches. It starts by mining the literature to quickly extract a set of genes known to be linked with a given query, then it integrates the learning methodology of Endeav-our (a gene prioritization tool) to train a genomic model and rank a set of candidate genes to generate novel hypotheses. In a realistic evaluation setup, Beegle has an average recall of 84% in the top 100 returned genes as a search engine, which improves the discovery engine by 12.6% in the top 5% prioritized genes. Beegle is publicly available at http://beegle.esat.kuleuven.be/.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Literature mining discerns latent disease-gene relationships
    Rai, Priyadarshini
    Jain, Atishay
    Kumar, Shivani
    Sharma, Divya
    Jha, Neha
    Chawla, Smriti
    Raj, Abhijit
    Gupta, Apoorva
    Poonia, Sarita
    Majumdar, Angshul
    Chakraborty, Tanmoy
    Ahuja, Gaurav
    Sengupta, Debarka
    BIOINFORMATICS, 2024, 40 (04)
  • [2] Disease-gene discovery pipeline.
    Braun, TA
    Casavant, TL
    Sheffield, VC
    AMERICAN JOURNAL OF HUMAN GENETICS, 2000, 67 (04) : 255 - 255
  • [3] On the assessment of statistical significance in disease-gene discovery
    Zhao, LP
    Prentice, R
    Shen, FM
    Hsu, L
    AMERICAN JOURNAL OF HUMAN GENETICS, 1999, 64 (06) : 1739 - 1753
  • [4] Large-Scale Discovery of Disease-Disease and Disease-Gene Associations
    Djordje Gligorijevic
    Jelena Stojanovic
    Nemanja Djuric
    Vladan Radosavljevic
    Mihajlo Grbovic
    Rob J. Kulathinal
    Zoran Obradovic
    Scientific Reports, 6
  • [5] DISEASES: Text mining and data integration of disease-gene associations
    Pletscher-Frankild, Sune
    Palleja, Albert
    Tsafou, Kalliopi
    Binder, Janos X.
    Jensen, Lars Juhl
    METHODS, 2015, 74 : 83 - 89
  • [6] Next-generation diagnostics and disease-gene discovery with the Exomiser
    Smedley, Damian
    Jacobsen, Julius O. B.
    Jaeger, Marten
    Koehler, Sebastian
    Holtgrewe, Manuel
    Schubach, Max
    Siragusa, Enrico
    Zemojtel, Tomasz
    Buske, Orion J.
    Washington, Nicole L.
    Bone, William P.
    Haendel, Melissa A.
    Robinson, Peter N.
    NATURE PROTOCOLS, 2015, 10 (12) : 2004 - 2015
  • [7] Large-Scale Discovery of Disease-Disease and Disease-Gene Associations
    Gligorijevic, Djordje
    Stojanovic, Jelena
    Djuric, Nemanja
    Radosavljevic, Vladan
    Grbovic, Mihajlo
    Kulathinal, Rob J.
    Obradovic, Zoran
    SCIENTIFIC REPORTS, 2016, 6
  • [8] Next-generation diagnostics and disease-gene discovery with the Exomiser
    Damian Smedley
    Julius O B Jacobsen
    Marten Jäger
    Sebastian Köhler
    Manuel Holtgrewe
    Max Schubach
    Enrico Siragusa
    Tomasz Zemojtel
    Orion J Buske
    Nicole L Washington
    William P Bone
    Melissa A Haendel
    Peter N Robinson
    Nature Protocols, 2015, 10 : 2004 - 2015
  • [9] An automated approach to identifying disease-gene associations from the medical literature to inform gene panel design
    Kiel, Mark
    Schu, Matthew
    Schwartz, Steve
    Weigman, Victor
    CANCER RESEARCH, 2017, 77
  • [10] Strategies for exome and genome sequence data analysis in disease-gene discovery projects
    Robinson, P. N.
    Krawitz, P.
    Mundlos, S.
    CLINICAL GENETICS, 2011, 80 (02) : 127 - 132