A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation

被引:7
|
作者
Pan, Youcheng [1 ]
Wang, Chenghao [1 ]
Hu, Baotian [1 ]
Xiang, Yang [2 ]
Wang, Xiaolong [1 ]
Chen, Qingcai [1 ,2 ]
Chen, Junjie [1 ]
Du, Jingcheng [3 ]
机构
[1] Harbin Inst Technol, Intelligent Comp Res Ctr, 6,Pingshan 1st Rd, Shenzhen 518055, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA
关键词
electronic medical record; text-to-SQL generation; BERT; grammar-based decoding; tree-structured intermediate representation;
D O I
10.2196/32698
中图分类号
R-058 [];
学科分类号
摘要
Background: Electronic medical records (EMRs) are usually stored in relational databases that require SQL queries to retrieve information of interest. Effectively completing such queries can be a challenging task for medical experts due to the barriers in expertise. Existing text-to-SQL generation studies have not been fully embraced in the medical domain. Objective: The objective of this study was to propose a neural generation model that can jointly consider the characteristics of medical text and the SQL structure to automatically transform medical texts to SQL queries for EMRs. Methods: We proposed a medical text-to-SQL model (MedTS), which employed a pretrained Bidirectional Encoder Representations From Transformers model as the encoder and leveraged a grammar-based long short-term memory network as the decoder to predict the intermediate representation that can easily be transformed into the final SQL query. We adopted the syntax tree as the intermediate representation rather than directly regarding the SQL query as an ordinary word sequence, which is more in line with the tree-structure nature of SQL and can also effectively reduce the search space during generation. Experiments were conducted on the MIMICSQL dataset, and 5 competitor methods were compared. Results: Experimental results demonstrated that MedTS achieved the accuracy of 0.784 and 0.899 on the test set in terms of logic form and execution, respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and offered substantial improvements. Conclusions: The proposed MedTS was effective and robust for improving the performance of medical text-to-SQL generation, indicating strong potential to be applied in the real medical scenario.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Named entity recognition of Chinese electronic medical records based on a hybrid neural network and medical MC-BERT
    Chen, Peng
    Zhang, Meng
    Yu, Xiaosheng
    Li, Songpu
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [22] Named entity recognition of Chinese electronic medical records based on a hybrid neural network and medical MC-BERT
    Peng Chen
    Meng Zhang
    Xiaosheng Yu
    Songpu Li
    BMC Medical Informatics and Decision Making, 22
  • [23] An integrated e-service model for electronic medical records
    Liu, Liping
    Zhu, Dan
    INFORMATION SYSTEMS AND E-BUSINESS MANAGEMENT, 2013, 11 (01) : 161 - 183
  • [24] Implementation of a Regional Standardised Model for Perinatal Electronic Medical Records
    José Luis Leante-Castellanos
    María Isabel Mañas-Uxo
    Beatriz Garnica-Martínez
    Aurora Tomás-Lizcano
    Andrés Muñoz-Soto
    Journal of Medical Systems, 46
  • [25] Benchmarking Electronic Medical Records Initiatives in the US: a Conceptual Model
    Carlos Palacio
    Jeffrey P. Harrison
    David Garets
    Journal of Medical Systems, 2010, 34 : 273 - 279
  • [26] Benchmarking Electronic Medical Records Initiatives in the US: a Conceptual Model
    Palacio, Carlos
    Harrison, Jeffrey P.
    Garets, David
    JOURNAL OF MEDICAL SYSTEMS, 2010, 34 (03) : 273 - 279
  • [27] Implementation of a Regional Standardised Model for Perinatal Electronic Medical Records
    Luis Leante-Castellanos, Jose
    Isabel Manas-Uxo, Maria
    Garnica-Martinez, Beatriz
    Tomas-Lizcano, Aurora
    Munoz-Soto, Andres
    JOURNAL OF MEDICAL SYSTEMS, 2022, 46 (12)
  • [28] An integrated e-service model for electronic medical records
    Liping Liu
    Dan Zhu
    Information Systems and e-Business Management, 2013, 11 : 161 - 183
  • [29] Agent-Based Simulation Model for Predicting Adoption Rates of Electronic Medical Records
    DeMarco, Dominic M.
    Kovela, Shravan
    Smith, Lauren M.
    Verella, J. Tipan
    Learmonth, Gerard P.
    Patek, Stephen D.
    2009 IEEE SYSTEMS AND INFORMATION ENGINEERING DESIGN SYMPOSIUM (SIEDS), 2009, : 7 - +
  • [30] Range Query in Blockchain-based Data Sharing Model for Electronic Medical Records
    Li, Jingwen
    Dun, Wenlong
    2020 3RD INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY (CISAT) 2020, 2020, 1634