A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation

被引:7
|
作者
Pan, Youcheng [1 ]
Wang, Chenghao [1 ]
Hu, Baotian [1 ]
Xiang, Yang [2 ]
Wang, Xiaolong [1 ]
Chen, Qingcai [1 ,2 ]
Chen, Junjie [1 ]
Du, Jingcheng [3 ]
机构
[1] Harbin Inst Technol, Intelligent Comp Res Ctr, 6,Pingshan 1st Rd, Shenzhen 518055, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] Univ Texas Hlth Sci Ctr Houston, Houston, TX 77030 USA
关键词
electronic medical record; text-to-SQL generation; BERT; grammar-based decoding; tree-structured intermediate representation;
D O I
10.2196/32698
中图分类号
R-058 [];
学科分类号
摘要
Background: Electronic medical records (EMRs) are usually stored in relational databases that require SQL queries to retrieve information of interest. Effectively completing such queries can be a challenging task for medical experts due to the barriers in expertise. Existing text-to-SQL generation studies have not been fully embraced in the medical domain. Objective: The objective of this study was to propose a neural generation model that can jointly consider the characteristics of medical text and the SQL structure to automatically transform medical texts to SQL queries for EMRs. Methods: We proposed a medical text-to-SQL model (MedTS), which employed a pretrained Bidirectional Encoder Representations From Transformers model as the encoder and leveraged a grammar-based long short-term memory network as the decoder to predict the intermediate representation that can easily be transformed into the final SQL query. We adopted the syntax tree as the intermediate representation rather than directly regarding the SQL query as an ordinary word sequence, which is more in line with the tree-structure nature of SQL and can also effectively reduce the search space during generation. Experiments were conducted on the MIMICSQL dataset, and 5 competitor methods were compared. Results: Experimental results demonstrated that MedTS achieved the accuracy of 0.784 and 0.899 on the test set in terms of logic form and execution, respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and offered substantial improvements. Conclusions: The proposed MedTS was effective and robust for improving the performance of medical text-to-SQL generation, indicating strong potential to be applied in the real medical scenario.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] The model of "taking electronic medical records as the core for information construction in hospitals"
    WU Tao
    XU Ke
    LI Ping
    LI Xian-feng
    XU Wei-guo
    中华医学杂志(英文版), 2013, (02) : 373 - 377
  • [42] Development and validation of a case definition for problematic menopause in primary care electronic medical records
    Anh N.Q. Pham
    Michael Cummings
    Nese Yuksel
    Beate Sydora
    Tyler Williamson
    Stephanie Garies
    Russell Pilling
    Cliff Lindeman
    Sue Ross
    BMC Medical Informatics and Decision Making, 23
  • [43] Development and validation of a case definition for problematic menopause in primary care electronic medical records
    Pham, Anh N. Q.
    Cummings, Michael
    Yuksel, Nese
    Sydora, Beate
    Williamson, Tyler
    Garies, Stephanie
    Pilling, Russell
    Lindeman, Cliff
    Ross, Sue
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2023, 23 (01)
  • [44] Development and validation of algorithms for the detection of statin myopathy signals from electronic medical records
    Chan, S. L.
    Tham, M. Y.
    Tan, S. H.
    Loke, C.
    Foo, B. P. Q.
    Fan, Y.
    Ang, P. S.
    Brunham, L. R.
    Sung, C.
    CLINICAL PHARMACOLOGY & THERAPEUTICS, 2017, 101 (05) : 667 - 674
  • [45] Exploring critical factors influencing physicians' acceptance of mobile electronic medical records based on the dual-factor model: a validation in Taiwan
    Liu, Chung-Feng
    Cheng, Tain-Junn
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2015, 15
  • [46] Exploring critical factors influencing physicians’ acceptance of mobile electronic medical records based on the dual-factor model: a validation in Taiwan
    Chung-Feng Liu
    Tain-Junn Cheng
    BMC Medical Informatics and Decision Making, 15
  • [47] Deep-ADCA: Development and Validation of Deep Learning Model for Automated Diagnosis Code Assignment Using Clinical Notes in Electronic Medical Records
    Masud, Jakir Hossain Bhuiyan
    Shun, Chiang
    Kuo, Chen-Cheng
    Islam, Md Mohaimenul
    Yeh, Chih-Yang
    Yang, Hsuan-Chia
    Lin, Ming-Chin
    JOURNAL OF PERSONALIZED MEDICINE, 2022, 12 (05):
  • [48] Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts
    Gaihong Yu
    Zhixiong Zhang
    Huan Liu
    Liangping Ding
    Journal of Data and Information Science, 2019, 4 (04) : 42 - 55
  • [49] PREDICTING MEDICAL SPECIALTY CHOICE - A MODEL BASED ON STUDENTS RECORDS
    FADEM, BH
    NICOLICH, MJ
    SIMRING, SS
    DAUBER, MH
    BULLOCK, LA
    JOURNAL OF MEDICAL EDUCATION, 1984, 59 (05): : 407 - 415
  • [50] Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts
    Yu, Gaihong
    Zhang, Zhixiong
    Liu, Huan
    Ding, Liangping
    JOURNAL OF DATA AND INFORMATION SCIENCE, 2019, 4 (04) : 42 - 55