A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation

被引:7
|
作者
Pan, Youcheng [1 ]
Wang, Chenghao [1 ]
Hu, Baotian [1 ]
Xiang, Yang [2 ]
Wang, Xiaolong [1 ]
Chen, Qingcai [1 ,2 ]
Chen, Junjie [1 ]
Du, Jingcheng [3 ]
机构
[1] Harbin Inst Technol, Intelligent Comp Res Ctr, 6,Pingshan 1st Rd, Shenzhen 518055, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA
关键词
electronic medical record; text-to-SQL generation; BERT; grammar-based decoding; tree-structured intermediate representation;
D O I
10.2196/32698
中图分类号
R-058 [];
学科分类号
摘要
Background: Electronic medical records (EMRs) are usually stored in relational databases that require SQL queries to retrieve information of interest. Effectively completing such queries can be a challenging task for medical experts due to the barriers in expertise. Existing text-to-SQL generation studies have not been fully embraced in the medical domain. Objective: The objective of this study was to propose a neural generation model that can jointly consider the characteristics of medical text and the SQL structure to automatically transform medical texts to SQL queries for EMRs. Methods: We proposed a medical text-to-SQL model (MedTS), which employed a pretrained Bidirectional Encoder Representations From Transformers model as the encoder and leveraged a grammar-based long short-term memory network as the decoder to predict the intermediate representation that can easily be transformed into the final SQL query. We adopted the syntax tree as the intermediate representation rather than directly regarding the SQL query as an ordinary word sequence, which is more in line with the tree-structure nature of SQL and can also effectively reduce the search space during generation. Experiments were conducted on the MIMICSQL dataset, and 5 competitor methods were compared. Results: Experimental results demonstrated that MedTS achieved the accuracy of 0.784 and 0.899 on the test set in terms of logic form and execution, respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and offered substantial improvements. Conclusions: The proposed MedTS was effective and robust for improving the performance of medical text-to-SQL generation, indicating strong potential to be applied in the real medical scenario.
引用
收藏
页数:14
相关论文
共 50 条
  • [11] ELECTRONIC MEDICAL RECORDS SYSTEM VALIDATION OF CLAIMS-BASED QUERIES FOR ACUTE RENAL FAILURE
    Rosenman, Marc B.
    Wahl, Peter
    Daniel, Greg
    Overhage, J. Marc
    McGuire, Patricia
    Thompson, Dan
    Rodgers, Keith
    Short, Louise
    Bohn, Rhonda
    Tierney, William M.
    AMERICAN JOURNAL OF KIDNEY DISEASES, 2009, 53 (04) : A66 - A66
  • [12] Medical aided diagnosis using electronic medical records based on LDA and word vector model
    Jin, Xin-Yu
    Pu, Dong-Xu
    Lan, Yi-Zheng
    Li, Lan-Juan
    2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 443 - 445
  • [13] The adoption level of electronic medical records in hebron hospitals based on the electronic medical record adoption model (EMRAM)
    Najjar, Arwa
    Amro, Belal
    Macedo, Mario
    HEALTH POLICY AND TECHNOLOGY, 2021, 10 (04)
  • [14] Validation of LungFlag™ Prediction Model Using Electronic Medical Records (EMR) On Taiwan Data
    Choman, E. N.
    Lanyado, A.
    Israeli, E.
    Olghi, N.
    Jin, Y.
    Tsai, S. -Y.
    Liu, S. -Y.
    Obradovic, M.
    Yang, P. -C.
    JOURNAL OF THORACIC ONCOLOGY, 2024, 19 (10) : S370 - S370
  • [15] Enhancing Sentiment Analysis for Chinese Texts Using a BERT-Based Model with a Custom Attention Mechanism
    Ding, Linlin
    Han, Yiming
    Li, Mo
    Li, Dong
    WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 172 - 179
  • [16] Evaluation of a Prediction Model for the Development of Atrial Fibrillation in a Repository of Electronic Medical Records
    Kolek, Matthew J.
    Graves, Amy J.
    Xu, Meng
    Bian, Aihua
    Teixeira, Pedro Luis
    Shoemaker, M. Benjamin
    Parvez, Babar
    Xu, Hua
    Heckbert, Susan R.
    Ellinor, Patrick T.
    Benjamin, Emelia J.
    Alonso, Alvaro
    Denny, Joshua C.
    Moons, Karel G. M.
    Shintani, Ayumi K.
    Harrell, Frank E., Jr.
    Roden, Dan M.
    Darbar, Dawood
    JAMA CARDIOLOGY, 2016, 1 (09) : 1007 - 1013
  • [17] Achieving Data Completeness in Electronic Medical Records: A Conceptual Model and Hypotheses Development
    Liu, Caihua
    Zowghi, Didar
    Talaei-Khoei, Amir
    Daniel, Jay
    PROCEEDINGS OF THE 51ST ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2018, : 2824 - 2833
  • [18] Fall Ascertainment and Development of a Risk Prediction Model Using Electronic Medical Records
    Oshiro, Caryn E. S.
    Frankland, Timothy B.
    Rosales, A. Gabriela
    Perrin, Nancy A.
    Bell, Christina L.
    Lo, Serena H. Y.
    Trinacty, Connie M.
    JOURNAL OF THE AMERICAN GERIATRICS SOCIETY, 2019, 67 (07) : 1417 - 1422
  • [19] A design rationale-based model as an add-on to electronic medical records
    Billa, Cleo
    Barsottini, Claudia
    Wainer, Jacques
    PROCEEDINGS OF THE 21ST IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, 2008, : 503 - +
  • [20] A MODEL FOR CRITIQUING BASED ON AUTOMATED MEDICAL RECORDS
    VANDERLEI, J
    MUSEN, MA
    COMPUTERS AND BIOMEDICAL RESEARCH, 1991, 24 (04): : 344 - 378