A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records

被引:15
|
作者
Tahsin, Tasnia [1 ]
Weissenbacher, Davy [1 ]
Rivera, Robert [1 ]
Beard, Rachel [1 ]
Firago, Mari [1 ]
Wallstrom, Garrick [1 ]
Scotch, Matthew [1 ]
Gonzalez, Graciela [1 ]
机构
[1] Arizona State Univ, Dept Biomed Informat, 13212 E Shea Blvd, Scottsdale, AZ 85259 USA
基金
美国国家卫生研究院;
关键词
phylogeography; information extraction; natural language processing; SPATIAL EPIDEMIOLOGY; PHYLOGEOGRAPHY; SPREAD; VISUALIZATION; KNOWLEDGE; H5N1;
D O I
10.1093/jamia/ocv172
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. Materials and Methods We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. Results We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Discussion Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Conclusion Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles.
引用
收藏
页码:934 / 941
页数:8
相关论文
共 50 条
  • [1] Multistage Rule-Based Positioning Optimization for High-Precision LPAT
    Nurhadi, Hendro
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2011, 60 (10) : 3431 - 3443
  • [2] On the efficacy of per-relation basis performance evaluation for PPI extraction and a high-precision rule-based approach
    Lee, Junkyu
    Kim, Seongsoon
    Lee, Sunwon
    Lee, Kyubum
    Kang, Jaewoo
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2013, 13
  • [3] On the efficacy of per-relation basis performance evaluation for PPI extraction and a high-precision rule-based approach
    Junkyu Lee
    Seongsoon Kim
    Sunwon Lee
    Kyubum Lee
    Jaewoo Kang
    [J]. BMC Medical Informatics and Decision Making, 13
  • [4] A Rule-based Framework of Metadata Extraction from Scientific Papers
    Guo, Zhixin
    Jin, Hai
    [J]. 2011 TENTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES), 2011, : 400 - 404
  • [5] A Hybrid Case-based and Rule-based for Metadata Extraction on Heterogeneous Thai Documents
    Khankasikam, Krisda
    [J]. 2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 1, 2010, : 312 - 317
  • [6] Flexible Rule-Based Decomposition and Metadata Independence in Modin: A Parallel Dataframe System
    Petersohn, Devin
    Tang, Dixin
    Durrani, Rehan
    Melik-Adamyan, Areg
    Gonzalez, Joseph E.
    Joseph, Anthony D.
    Parameswaran, Aditya G.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 15 (03): : 739 - 751
  • [7] Development of the precision feeding system for sows via a rule-based expert system
    Chen, Chong
    Liu, Xingqiao
    Liu, Chaoji
    Pan, Qin
    [J]. INTERNATIONAL JOURNAL OF AGRICULTURAL AND BIOLOGICAL ENGINEERING, 2023, 16 (02) : 187 - 198
  • [8] HIGH-PRECISION BIOLOGICAL EVENT EXTRACTION: EFFECTS OF SYSTEM AND OF DATA
    Cohen, K. Bretonnel
    Verspoor, Karin
    Johnson, Helen L.
    Roeder, Chris
    Ogren, Philip V.
    Baumgartner, William A., Jr.
    White, Elizabeth
    Tipney, Hannah
    Hunter, Lawrence
    [J]. COMPUTATIONAL INTELLIGENCE, 2011, 27 (04) : 681 - 701
  • [9] An Aeroengine Measurement System Based on High-precision Turntable
    Li, Di
    Hu, Cheng-Hai
    Tan, Ben-Neng
    Lei, Wen-Ming
    Tang, Chun-Chun
    [J]. AOPC 2019: NANOPHOTONICS, 2019, 11336
  • [10] The Information Extraction and Potential Analysis of Field Based on High-precision DEMs
    Yang Zaigui
    Min, Lu
    Qing, Deng
    Jian, Jia
    Huang, Huang
    [J]. 2015 23RD INTERNATIONAL CONFERENCE ON GEOINFORMATICS, 2015,