Infrastructure for Semantic Annotation in the Genomics Domain

被引:0
|
作者
El-Haj, Mahmoud [1 ]
Prentice, Sheryl [1 ]
Mariani, John [1 ]
Rutherford, Nathan [1 ]
Ide, Nancy [2 ]
Rayson, Paul [1 ]
Coole, Matt [1 ]
Knight, Jo [1 ]
Ezeani, Ignatius [1 ]
Piao, Scott [1 ]
Suderman, Keith [2 ]
机构
[1] Univ Lancaster, Lancaster, England
[2] Vassar Coll, Poughkeepsie, NY 12601 USA
基金
英国惠康基金;
关键词
BioNLP; Ontology; Semantic Tagger; Corpus; PubMed; Genomics; Infrastructure;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We describe a novel super-infrastructure for biomedical text mining which incorporates an end-to-end pipeline for the collection, annotation, storage, retrieval and analysis of biomedical and life sciences literature, combining NLP and corpus linguistics methods. The infrastructure permits extreme-scale research on the open access PubMed Central archive. It combines an updatable Gene Ontology Semantic Tagger (GOST) for entity identification and semantic markup in the literature, with a NLP pipeline scheduler (Buster) to collect and process the corpus, and a bespoke columnar corpus database (LexiDB) for indexing. The corpus database is distributed to permit fast indexing, and provides a simple web front-end with corpus linguistics methods for sub-corpus comparison and retrieval. GOST is also connected as a service in the Language Application (LAPPS) Grid, in which context it is interoperable with other NLP tools and data in the Grid and can be combined with them in more complex workflows. In a literature based discovery setting, we have created an annotated corpus of 9,776 papers with 5,481,543 words.
引用
收藏
页码:6921 / 6929
页数:9
相关论文
共 50 条
  • [1] GlycoTree: Infrastructure Supporting Semantic Annotation of Glycan Structures
    York, Will
    Ranzinger, Rene
    Edwards, Nathan
    Zhang, Wenjin
    Tiemeyer, Michael
    [J]. GLYCOBIOLOGY, 2020, 30 (12) : 1130 - 1130
  • [2] STIA: Experience of Semantic Annotation in Jurisprudence Domain
    Pazienza, Maria Teresa
    Scarpato, Noemi
    Stellato, Armando
    [J]. LEGAL KNOWLEDGE AND INFORMATION SYSTEMS: JURIX 2009: THE TWENTY-SECOND ANNUAL CONFERENCE, 2009, 205 : 156 - 161
  • [3] Domain patterns and semantic annotation of web pages
    Kudelka, Milos
    Snasel, Vaclav
    El-Qawasmeh, Eyas
    Lehecka, Ondrej
    Tesarik, Jiri
    [J]. 2006 1ST INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, 2006, : 504 - +
  • [4] Semantic annotation of data tables using a domain ontology
    Hignette, Gaelle
    Buche, Patrice
    Dibie-Barthelemy, Juliette
    Haemmerle, Ollivier
    [J]. DISCOVERY SCIENCE, PROCEEDINGS, 2007, 4755 : 253 - +
  • [5] Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics
    Leonid L Chepelev
    Alexandre Riazanov
    Alexandre Kouznetsov
    Hong Sang Low
    Michel Dumontier
    Christopher JO Baker
    [J]. BMC Bioinformatics, 12
  • [6] Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics
    Chepelev, Leonid L.
    Riazanov, Alexandre
    Kouznetsov, Alexandre
    Low, Hong Sang
    Dumontier, Michel
    Baker, Christopher J. O.
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [7] Automatically Semantic Annotation of Network Document Based on Domain Knowledge Graph
    Wu, Yuezhong
    Wang, Zhihong
    Chen, Shuhong
    Wang, Guojun
    Li, Changyun
    [J]. 2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 715 - 721
  • [8] Evaluating automatic cross-domain Dutch semantic role annotation
    De Clercq, Orphee
    Hoste, Veronique
    Monachesi, Paola
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 88 - 93
  • [9] Automatic image annotation and semantic based image retrieval for medical domain
    Burdescu, Dumitru Dan
    Mihai, Cristian Gabriel
    Stanescu, Liana
    Brezovan, Marius
    [J]. NEUROCOMPUTING, 2013, 109 : 33 - 48
  • [10] Semantic Metadata Annotation Services in the Biomedical Domain-A Literature Review
    Sasse, Julia
    Darms, Johannes
    Fluck, Juliane
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (02):