Beegle: from literature mining to disease-gene discovery

被引:19
|
作者
ElShal, Sarah [1 ,2 ]
Tranchevent, Leon-Charles [1 ,2 ,3 ,4 ,5 ]
Sifrim, Alejandro [1 ,2 ,6 ]
Ardeshirdavani, Amin [1 ,2 ]
Davis, Jesse [7 ]
Moreau, Yves [1 ,2 ]
机构
[1] Katholieke Univ Leuven, Dept Elect Engn ESAT STADIUS Ctr Dynam Syst, Signal Proc & Data Analyt Dept, B-3001 Leuven, Belgium
[2] Katholieke Univ Leuven, iMinds Future Hlth Dept, B-3001 Leuven, Belgium
[3] Canc Res Ctr Lyon, INSERM, UMR S1052, CNRS,UMR5286, Lyon, France
[4] Univ Lyon 1, F-69622 Villeurbanne, France
[5] Ctr Leon Berard, F-69373 Lyon, France
[6] Wellcome Trust Sanger Inst, Wellcome Trust Genome Campus, Cambridge CB10 1SA, England
[7] Katholieke Univ Leuven, Dept Comp Sci DTAI, B-3001 Leuven, Belgium
关键词
CANDIDATE GENES; PRIORITIZATION; ASSOCIATION; IDENTIFICATION;
D O I
10.1093/nar/gkv905
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Disease-gene identification is a challenging process that has multiple applications within functional genomics and personalized medicine. Typically, this process involves both finding genes known to be associated with the disease (through literature search) and carrying out preliminary experiments or screens (e.g. linkage or association studies, copy number analyses, expression profiling) to determine a set of promising candidates for experimental validation. This requires extensive time and monetary resources. We describe Beegle, an online search and discovery engine that attempts to simplify this process by automating the typical approaches. It starts by mining the literature to quickly extract a set of genes known to be linked with a given query, then it integrates the learning methodology of Endeav-our (a gene prioritization tool) to train a genomic model and rank a set of candidate genes to generate novel hypotheses. In a realistic evaluation setup, Beegle has an average recall of 84% in the top 100 returned genes as a search engine, which improves the discovery engine by 12.6% in the top 5% prioritized genes. Beegle is publicly available at http://beegle.esat.kuleuven.be/.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] DISEASE-GENE ASSOCIATIONS IN ADMIXED POPULATIONS
    SRINIVASAN, MR
    CHAKRABORTY, R
    AMERICAN JOURNAL OF HUMAN GENETICS, 1991, 49 (04) : 483 - 483
  • [22] A large language model framework for literature-based disease-gene association prediction
    Li, Peng-Hsuan
    Sun, Yih-Yun
    Juan, Hsueh-Fen
    Chen, Chien-Yu
    Tsai, Huai-Kuang
    Huang, Jia-Hsin
    BRIEFINGS IN BIOINFORMATICS, 2025, 26 (01)
  • [23] Systematic identification of latent disease-gene associations from PubMed articles
    Zhang, Yuji
    Shen, Feichen
    Mojarad, Majid Rastegar
    Li, Dingcheng
    Liu, Sijia
    Tao, Cui
    Yu, Yue
    Liu, Hongfang
    PLOS ONE, 2018, 13 (01):
  • [24] Mining gene-centric relationships from literature: the roles of gene mutation and gene expression in supporting drug discovery
    Tari, Luis
    Patel, Jagruti
    Kuentzer, Jan
    Li, Ying
    Peng, Zhengwei
    Wang, Yuan
    Aguiar, Laura
    Cai, James
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2014, 10 (04) : 357 - 373
  • [25] A computational framework for the prioritization of disease-gene candidates
    Fiona Browne
    Haiying Wang
    Huiru Zheng
    BMC Genomics, 16
  • [26] A probabilistic disease-gene finder for personal genomes
    Yandell, Mark
    Huff, Chad
    Hu, Hao
    Singleton, Marc
    Moore, Barry
    Xing, Jinchuan
    Jorde, Lynn B.
    Reese, Martin G.
    GENOME RESEARCH, 2011, 21 (09) : 1529 - 1542
  • [27] Integrating microarrays into disease-gene identification strategies
    Dobrin, SE
    Stephan, DA
    EXPERT REVIEW OF MOLECULAR DIAGNOSTICS, 2003, 3 (03) : 375 - 385
  • [28] Disease-Gene Association Using a Genetic Algorithm
    Tahmasebipour, Koosha
    Houghten, Sheridan
    2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2014, : 191 - 197
  • [29] A computational framework for the prioritization of disease-gene candidates
    Browne, Fiona
    Wang, Haiying
    Zheng, Huiru
    BMC GENOMICS, 2015, 16
  • [30] Correction: Corrigendum: An analysis of disease-gene relationship from Medline abstracts by DigSee
    Jeongkyun Kim
    Jung-jae Kim
    Hyunju Lee
    Scientific Reports, 7