Text-mining assisted regulatory annotation

被引:19
|
作者
Aerts, Stein [1 ,2 ]
Haeussler, Maximilian [3 ]
van Vooren, Steven [4 ]
Griffith, Obi L. [5 ]
Hulpiau, Paco [6 ]
Jones, Steven J. M. [5 ]
Montgomery, Stephen B.
Bergman, Casey M. [7 ]
机构
[1] VIB, Dept Mol & Dev Genet, Neurogenet Lab, B-3000 Louvain, Belgium
[2] Katholieke Univ Leuven, Sch Med, Dept Human Genet, B-3000 Louvain, Belgium
[3] CNRS, Inst Neurosci A Fessard, F-91198 Gif Sur Yvette, France
[4] Katholieke Univ Leuven, Dept Elect Engn, B-3001 Heverlee, Belgium
[5] British Columbia Canc Agcy, Canadas Michael Smith Genome Sci Ctr, Vancouver, BC V5Z 4E6, Canada
[6] Univ Ghent VIB, Dept Mol Biomed Res, B-9052 Ghent, Belgium
[7] Univ Manchester, Fac Life Sci, Manchester M13 9PT, Lancs, England
基金
美国国家科学基金会; 加拿大自然科学与工程研究理事会; 加拿大健康研究院;
关键词
D O I
10.1186/gb-2008-9-2-r31
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Decoding transcriptional regulatory networks and the genomic cis-regulatory logic implemented in their control nodes is a fundamental challenge in genome biology. High-throughput computational and experimental analyses of regulatory networks and sequences rely heavily on positive control data from prior small-scale experiments, but the vast majority of previously discovered regulatory data remains locked in the biomedical literature. Results: We develop text-mining strategies to identify relevant publications and extract sequence information to assist the regulatory annotation process. Using a vector space model to identify Medline abstracts from papers likely to have high cis-regulatory content, we demonstrate that document relevance ranking can assist the curation of transcriptional regulatory networks and estimate that, minimally, 30,000 papers harbor unannotated cis-regulatory data. In addition, we show that DNA sequences can be extracted from primary text with high cis-regulatory content and mapped to genome sequences as a means of identifying the location, organism and target gene information that is critical to the cis-regulatory annotation process. Conclusion: Our results demonstrate that text-mining technologies can be successfully integrated with genome annotation systems, thereby increasing the availability of annotated cis-regulatory data needed to catalyze advances in the field of gene regulation.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Text-mining assisted regulatory annotation
    Stein Aerts
    Maximilian Haeussler
    Steven van Vooren
    Obi L Griffith
    Paco Hulpiau
    Steven JM Jones
    Stephen B Montgomery
    Casey M Bergman
    [J]. Genome Biology, 9
  • [2] A Text-Mining System for Concept Annotation in Biomedical Full Text Articles
    Wei, Chih-Hsuan
    Allot, Alexis
    Leaman, Robert
    Lu, Zhiyong
    [J]. ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 540 - 540
  • [3] PubMeth: a cancer methylation database combining text-mining and expert annotation
    Ongenaert, Mate
    Van Neste, Leander
    De Meyer, Tim
    Menschaert, Gerben
    Bekaert, Sofie
    Van Criekinge, Wim
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D842 - D846
  • [4] Text-Mining and Neuroscience
    Ambert, Kyle H.
    Cohen, Aaron M.
    [J]. BIOINFORMATICS OF BEHAVIOR: PART 1, 2012, 103 : 109 - 132
  • [5] Preimplantation development regulatory pathway construction through a text-mining approach
    Elisa Donnard
    Adriano Barbosa-Silva
    Rafael LM Guedes
    Gabriel R Fernandes
    Henrique Velloso
    Matthew J Kohn
    Miguel A Andrade-Navarro
    J Miguel Ortega
    [J]. BMC Genomics, 12
  • [6] Preimplantation development regulatory pathway construction through a text-mining approach
    Donnard, Elisa
    Barbosa-Silva, Adriano
    Guedes, Rafael L. M.
    Fernandes, Gabriel R.
    Velloso, Henrique
    Kohn, Matthew J.
    Andrade-Navarro, Miguel A.
    Ortega, J. Miguel
    [J]. BMC GENOMICS, 2011, 12
  • [7] Application of text-mining for updating protein post-translational modification annotation in UniProtKB
    Veuthey, Anne-Lise
    Bridge, Alan
    Gobeill, Julien
    Ruch, Patrick
    McEntyre, Johanna R.
    Bougueleret, Lydie
    Xenarios, Ioannis
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [8] Text-Mining the Voice of the People
    Evangelopoulos, Nicholas
    Visinescu, Lucian
    [J]. COMMUNICATIONS OF THE ACM, 2012, 55 (02) : 55 - 62
  • [9] Application of text-mining for updating protein post-translational modification annotation in UniProtKB
    Anne-Lise Veuthey
    Alan Bridge
    Julien Gobeill
    Patrick Ruch
    Johanna R McEntyre
    Lydie Bougueleret
    Ioannis Xenarios
    [J]. BMC Bioinformatics, 14
  • [10] Maximizing text-mining performance
    Weiss, SM
    Apte, C
    Damerau, FJ
    Johnson, DE
    Oles, FJ
    Goetz, T
    Hampp, T
    [J]. IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1999, 14 (04): : 63 - 69