A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records

被引:15
|
作者
Tahsin, Tasnia [1 ]
Weissenbacher, Davy [1 ]
Rivera, Robert [1 ]
Beard, Rachel [1 ]
Firago, Mari [1 ]
Wallstrom, Garrick [1 ]
Scotch, Matthew [1 ]
Gonzalez, Graciela [1 ]
机构
[1] Arizona State Univ, Dept Biomed Informat, 13212 E Shea Blvd, Scottsdale, AZ 85259 USA
基金
美国国家卫生研究院;
关键词
phylogeography; information extraction; natural language processing; SPATIAL EPIDEMIOLOGY; PHYLOGEOGRAPHY; SPREAD; VISUALIZATION; KNOWLEDGE; H5N1;
D O I
10.1093/jamia/ocv172
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. Materials and Methods We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. Results We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Discussion Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Conclusion Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles.
引用
收藏
页码:934 / 941
页数:8
相关论文
共 50 条
  • [41] Vision-based Automatic Die-cutting System with High-precision
    Shang, Wen
    Yan, Qin
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 1127 - 1131
  • [42] Embedded high-precision capacitor measurement system based on ring-oscillator
    Welter, L.
    Dreux, P.
    Aziza, H.
    Portal, J-M.
    [J]. ELECTRONICS LETTERS, 2015, 51 (06) : 521 - 522
  • [43] High-precision Steady Speed Flywheel System based on Sine Wave Drive
    Ma, Enyu
    Zhao, Hui
    Chen, Shuo
    Wang, Shuai
    Huo, Xin
    Yao, Yu
    [J]. PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 3036 - 3041
  • [44] Adaptive predictive scanning method based on a high-precision automatic microscopy system
    Hu, Junjie
    Zhong, Bowen
    Jin, Ziqi
    Wang, Zhenhua
    Sun, Lining
    [J]. APPLIED OPTICS, 2019, 58 (27) : 7305 - 7310
  • [45] Stream Selector's Control System Based on High-precision Incremental Encoder
    Zhang, Zhiguang
    Hu, Wei
    Li, Xiaoqiong
    Lv, Xuefei
    Zhang, Minping
    Zhang, Congxiao
    Deng, Yulin
    [J]. SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 1657 - +
  • [46] Enhanced high-level Petri net based knowledge verification for the rule-based system
    Ding, Caihong
    Huang, Wenhu
    Jiang, Xingwei
    [J]. Gaojishu Tongxin/High Technology Letters, 2000, 10 (04): : 58 - 63
  • [47] Design of a battery group high-precision measurement system based on error analysis
    Xu, Guojin
    Wu, Jian
    Wen, Jiapeng
    Bao, Yan
    Huang, Qinhe
    [J]. Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2013, 34 (09): : 1989 - 1997
  • [48] High-precision position control of belt drive system based on OPC communication
    Liu, Wei
    Wan, Ping
    Cheng, Jin
    Ma, Yongheng
    Jing, Cheng
    [J]. INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2022, 122 (01): : 1 - 10
  • [49] A design of high-precision positioning system of UAV based on the Qianxun location network
    Luo, Jialu
    Mo, Bo
    Lin, Jin
    [J]. 2018 37TH CHINESE CONTROL CONFERENCE (CCC), 2018, : 4633 - 4637
  • [50] Early Fluid Resuscitation of Burn Patients Based on High-Precision Weighing System
    Ye, Jianping
    Zhai, Qian
    Hua, Haiping
    Wang, Zhikang
    Chu, Yonghua
    Liang, Jiali
    Liu, Tao
    [J]. IEEE SENSORS JOURNAL, 2021, 21 (22) : 26023 - 26032