A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation

被引:7
|
作者
Pan, Youcheng [1 ]
Wang, Chenghao [1 ]
Hu, Baotian [1 ]
Xiang, Yang [2 ]
Wang, Xiaolong [1 ]
Chen, Qingcai [1 ,2 ]
Chen, Junjie [1 ]
Du, Jingcheng [3 ]
机构
[1] Harbin Inst Technol, Intelligent Comp Res Ctr, 6,Pingshan 1st Rd, Shenzhen 518055, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA
关键词
electronic medical record; text-to-SQL generation; BERT; grammar-based decoding; tree-structured intermediate representation;
D O I
10.2196/32698
中图分类号
R-058 [];
学科分类号
摘要
Background: Electronic medical records (EMRs) are usually stored in relational databases that require SQL queries to retrieve information of interest. Effectively completing such queries can be a challenging task for medical experts due to the barriers in expertise. Existing text-to-SQL generation studies have not been fully embraced in the medical domain. Objective: The objective of this study was to propose a neural generation model that can jointly consider the characteristics of medical text and the SQL structure to automatically transform medical texts to SQL queries for EMRs. Methods: We proposed a medical text-to-SQL model (MedTS), which employed a pretrained Bidirectional Encoder Representations From Transformers model as the encoder and leveraged a grammar-based long short-term memory network as the decoder to predict the intermediate representation that can easily be transformed into the final SQL query. We adopted the syntax tree as the intermediate representation rather than directly regarding the SQL query as an ordinary word sequence, which is more in line with the tree-structure nature of SQL and can also effectively reduce the search space during generation. Experiments were conducted on the MIMICSQL dataset, and 5 competitor methods were compared. Results: Experimental results demonstrated that MedTS achieved the accuracy of 0.784 and 0.899 on the test set in terms of logic form and execution, respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and offered substantial improvements. Conclusions: The proposed MedTS was effective and robust for improving the performance of medical text-to-SQL generation, indicating strong potential to be applied in the real medical scenario.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Patient Representation From Structured Electronic Medical Records Based on Embedding Technique: Development and Validation Study
    Huang, Yanqun
    Wang, Ni
    Zhang, Zhiqiang
    Liu, Honglei
    Fei, Xiaolu
    Wei, Lan
    Chen, Hui
    JMIR MEDICAL INFORMATICS, 2021, 9 (07)
  • [32] Development and Validation of Machine Learning and Electronic Medical Records-Based Characterization of Stiff Person Syndrome
    Park, Soo Hwan
    Song, Seo Ho
    NEUROLOGY, 2023, 100 (17)
  • [33] Automatic Generation of Electronic Medical Record Based on GPT2 Model
    Peng, Junkun
    Ni, Pin
    Zhu, Jiayi
    Dai, Zhenjin
    Li, Yuming
    Li, Gangmin
    Bai, Xuming
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6180 - 6182
  • [34] An Electronic Medical Record-Based Prognostic Model forInpatient Falls:Development and Internal-ExternalCross-Validation
    Parsons, Rex
    Blythe, Robin
    Cramb, Susanna
    Abdel-Hafez, Ahmad
    McPhail, Steven
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [35] Validation of an Algorithm for Type 1 Diabetes in Adults Based on Electronic Medical Records
    Schroeder, Emily B.
    Donahoo, W. Troy
    Goodrich, Glenn K.
    Raebel, Marsha A.
    DIABETES, 2017, 66 : A419 - A420
  • [36] Correction to: Implementation of a Regional Standardised Model for Perinatal Electronic Medical Records
    José Luis Leante-Castellanos
    María Isabel Mañas-Uxo
    Beatriz Garnica-Martínez
    Aurora Tomás-Lizcano
    Andrés Muñoz-Soto
    Journal of Medical Systems, 47
  • [37] The model of "taking electronic medical records as the core for information construction in hospitals"
    Wu Tao
    Xu Ke
    Li Ping
    Li Xian-feng
    Xu Wei-guo
    CHINESE MEDICAL JOURNAL, 2013, 126 (02) : 373 - 377
  • [38] Resset: A Recurrent Model for Sequence of Sets with Applications to Electronic Medical Records
    Phuoc Nguyen
    Truyen Tran
    Venkatesh, Svetha
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [39] Fast Model Adaptation for Automated Section Classification in Electronic Medical Records
    Ni, Jian
    Delaney, Brian
    Florian, Radu
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 35 - 39
  • [40] A Hybrid Model for Named Entity Recognition on Chinese Electronic Medical Records
    Wang, Yu
    Sun, Yining
    Ma, Zuchang
    Gao, Lisheng
    Xu, Yang
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (02)