Infrastructure for Semantic Annotation in the Genomics Domain

被引:0
|
作者
El-Haj, Mahmoud [1 ]
Prentice, Sheryl [1 ]
Mariani, John [1 ]
Rutherford, Nathan [1 ]
Ide, Nancy [2 ]
Rayson, Paul [1 ]
Coole, Matt [1 ]
Knight, Jo [1 ]
Ezeani, Ignatius [1 ]
Piao, Scott [1 ]
Suderman, Keith [2 ]
机构
[1] Univ Lancaster, Lancaster, England
[2] Vassar Coll, Poughkeepsie, NY 12601 USA
基金
英国惠康基金;
关键词
BioNLP; Ontology; Semantic Tagger; Corpus; PubMed; Genomics; Infrastructure;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We describe a novel super-infrastructure for biomedical text mining which incorporates an end-to-end pipeline for the collection, annotation, storage, retrieval and analysis of biomedical and life sciences literature, combining NLP and corpus linguistics methods. The infrastructure permits extreme-scale research on the open access PubMed Central archive. It combines an updatable Gene Ontology Semantic Tagger (GOST) for entity identification and semantic markup in the literature, with a NLP pipeline scheduler (Buster) to collect and process the corpus, and a bespoke columnar corpus database (LexiDB) for indexing. The corpus database is distributed to permit fast indexing, and provides a simple web front-end with corpus linguistics methods for sub-corpus comparison and retrieval. GOST is also connected as a service in the Language Application (LAPPS) Grid, in which context it is interoperable with other NLP tools and data in the Grid and can be combined with them in more complex workflows. In a literature based discovery setting, we have created an annotated corpus of 9,776 papers with 5,481,543 words.
引用
收藏
页码:6921 / 6929
页数:9
相关论文
共 50 条
  • [31] Protein annotation in the era of personal genomics
    Blicher, Thomas
    Gupta, Ramneek
    Wesolowska, Agata
    Jensen, Lars Juhl
    Brunak, Soren
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 2010, 20 (03) : 335 - 341
  • [32] Functional genomics annotation: It's logical!
    Anderson, MW
    [J]. SCIENTIST, 2005, 19 (05): : 33 - 33
  • [33] PubChemRDF: Semantic annotation and search
    Fu, Gang
    Bolton, Evan
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 251
  • [34] Semantic Annotation Tools Survey
    Oliveira, Pedro
    Rocha, Joao
    [J]. 2013 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING (CIDM), 2013, : 301 - 307
  • [35] SADIe: Semantic annotation for accessibility
    Bechhofer, Sean
    Harper, Simon
    Lunn, Darren
    [J]. SEMANTIC WEB - ISEC 2006, PROCEEDINGS, 2006, 4273 : 101 - +
  • [36] Semantic Annotation of Medical Images
    Seifert, Sascha
    Kelm, Michael
    Moeller, Manuel
    Mukherjee, Saikat
    Cavallaro, Alexander
    Huber, Martin
    Comaniciu, Dorin
    [J]. MEDICAL IMAGING 2010: ADVANCED PACS-BASED IMAGING INFORMATICS AND THERAPEUTIC APPLICATIONS, 2010, 7628
  • [37] An approach for supervised semantic annotation
    Dorado, A
    Izquierdo, E
    [J]. Digital Media: Processing Multimedia Interactive Services, 2003, : 117 - 121
  • [38] Semantic Annotation in SINUS Project
    Staykova, Kamenka
    Agre, Gennady
    Simov, Kiril
    Osenova, Petya
    [J]. THIRD INTERNATIONAL CONFERENCE ON SOFTWARE, SERVICES AND SEMANTIC TECHNOLOGIES S3T 2011, 2011, 101 : 217 - 218
  • [39] Semantic Annotation for News Feeds
    Hendi, Hanaa Ghareib
    Al-Feel, Haytham
    Hassanein, Ehab E.
    [J]. 2017 8TH IEEE ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (IEMCON), 2017, : 185 - 190
  • [40] Semantic Annotation in Historical Documents
    Pereira, Juliana Wolf
    Barros Goncalves, Marcelo Rocha
    Prado Santos, Marilde Terezinha
    [J]. 2017 12TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2017,