A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation

被引:7
|
作者
Pan, Youcheng [1 ]
Wang, Chenghao [1 ]
Hu, Baotian [1 ]
Xiang, Yang [2 ]
Wang, Xiaolong [1 ]
Chen, Qingcai [1 ,2 ]
Chen, Junjie [1 ]
Du, Jingcheng [3 ]
机构
[1] Harbin Inst Technol, Intelligent Comp Res Ctr, 6,Pingshan 1st Rd, Shenzhen 518055, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA
关键词
electronic medical record; text-to-SQL generation; BERT; grammar-based decoding; tree-structured intermediate representation;
D O I
10.2196/32698
中图分类号
R-058 [];
学科分类号
摘要
Background: Electronic medical records (EMRs) are usually stored in relational databases that require SQL queries to retrieve information of interest. Effectively completing such queries can be a challenging task for medical experts due to the barriers in expertise. Existing text-to-SQL generation studies have not been fully embraced in the medical domain. Objective: The objective of this study was to propose a neural generation model that can jointly consider the characteristics of medical text and the SQL structure to automatically transform medical texts to SQL queries for EMRs. Methods: We proposed a medical text-to-SQL model (MedTS), which employed a pretrained Bidirectional Encoder Representations From Transformers model as the encoder and leveraged a grammar-based long short-term memory network as the decoder to predict the intermediate representation that can easily be transformed into the final SQL query. We adopted the syntax tree as the intermediate representation rather than directly regarding the SQL query as an ordinary word sequence, which is more in line with the tree-structure nature of SQL and can also effectively reduce the search space during generation. Experiments were conducted on the MIMICSQL dataset, and 5 competitor methods were compared. Results: Experimental results demonstrated that MedTS achieved the accuracy of 0.784 and 0.899 on the test set in terms of logic form and execution, respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and offered substantial improvements. Conclusions: The proposed MedTS was effective and robust for improving the performance of medical text-to-SQL generation, indicating strong potential to be applied in the real medical scenario.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records
    Niu, Haoran
    Omitaomu, Olufemi A.
    Langston, Michael A.
    Olama, Mohammad
    Ozmen, Ozgur
    Klasky, Hilda B.
    Laurio, Angela
    Ward, Merry
    Nebeker, Jonathan
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 150
  • [2] Improving Bert-Based Model for Medical Text Classification with an Optimization Algorithm
    Gasmi, Karim
    ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 1653 : 101 - 111
  • [3] Integration of natural and deep artificial cognitive models in medical images: BERT-based NER and relation extraction for electronic medical records
    Guo, Bo
    Liu, Huaming
    Niu, Lei
    FRONTIERS IN NEUROSCIENCE, 2023, 17
  • [4] Bert-Based Chinese Medical Keyphrase Extraction Model Enhanced with External Features
    Ding, Liangping
    Zhang, Zhixiong
    Zhao, Yang
    TOWARDS OPEN AND TRUSTWORTHY DIGITAL SOCIETIES, ICADL 2021, 2021, 13133 : 167 - 176
  • [5] BERT-Based Neural Network for Inpatient Fall Detection From Electronic Medical Records: Retrospective Cohort Study
    Cheligeer, Cheligeer
    Wu, Guosong
    Lee, Seungwon
    Pan, Jie
    Southern, Danielle A.
    Martin, Elliot A.
    Sapiro, Natalie
    Eastwood, Cathy A.
    Quan, Hude
    Xu, Yuan
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [6] Semantic Search Engine Within Anatomy Books: A BERT-Based Model for Medical Students
    El Malhi, Marouane
    Talbi, Mohammed
    Lamti, Sanae
    Kerzazi, Noureddine
    ADVANCES IN SMART MEDICAL, IOT & ARTIFICIAL INTELLIGENCE, VOL 1, ICSMAI 2024, 2024, 11 : 108 - 115
  • [7] External Validation of a Prediction Model for the Development of Atrial Fibrillation in a Repository of Electronic Medical Records
    Kolek, Matthew J.
    Graves, Amy J.
    Bian, Aihua
    Teixeira, Pedro L.
    Shoemaker, Moore B.
    Parvez, Babar
    Xu, Hua
    Heckbert, Susan R.
    Ellinor, Patrick T.
    Benjamin, Emelia J.
    Alonso, Alvaro
    Denny, Joshua C.
    Moons, Karel G.
    Shintani, Ayumi K.
    Roden, Dan M.
    Darbar, Dawood
    CIRCULATION, 2014, 130
  • [8] Text-to-SQL Generation for Question Answering on Electronic Medical Records
    Wang, Ping
    Shi, Tian
    Reddy, Chandan K.
    WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 350 - 361
  • [9] A BERT-BiGRU-CRF Model for Entity Recognition of Chinese Electronic Medical Records
    Qin, Qiuli
    Zhao, Shuang
    Liu, Chunmei
    COMPLEXITY, 2021, 2021
  • [10] A BERT-based review helpfulness prediction model utilizing consistency of ratings and texts
    Li, Xinzhe
    Li, Qinglong
    Ryu, Dongyeop
    Kim, Jaekyeong
    APPLIED INTELLIGENCE, 2025, 55 (06)