A pre-trained BERT for Korean medical natural language processing

被引:0
|
作者
Yoojoong Kim
Jong-Ho Kim
Jeong Moon Lee
Moon Joung Jang
Yun Jin Yum
Seongtae Kim
Unsub Shin
Young-Min Kim
Hyung Joon Joo
Sanghoun Song
机构
[1] The Catholic University of Korea,School of Computer Science and Information Engineering
[2] Korea University Research Institute for Medical Bigdata Science,Department of Biostatistics
[3] Korea University,Department of Linguistics
[4] Korea University College of Medicine,Department of Cardiology, Cardiovascular Center
[5] Korea University,Department of Medical Informatics
[6] Korea University College of Medicine,School of Interdisciplinary Industrial Studies
[7] Korea University College of Medicine,undefined
[8] Hanyang University,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score.
引用
收藏
相关论文
共 50 条
  • [1] A pre-trained BERT for Korean medical natural language processing
    Kim, Yoojoong
    Kim, Jong-Ho
    Lee, Jeong Moon
    Jang, Moon Joung
    Yum, Yun Jin
    Kim, Seongtae
    Shin, Unsub
    Kim, Young-Min
    Joo, Hyung Joon
    Song, Sanghoun
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [2] Author Correction: A pre-trained BERT for Korean medical natural language processing
    Yoojoong Kim
    Jong-Ho Kim
    Jeong Moon Lee
    Moon Joung Jang
    Yun Jin Yum
    Seongtae Kim
    Unsub Shin
    Young-Min Kim
    Hyung Joon Joo
    Sanghoun Song
    [J]. Scientific Reports, 13
  • [3] A pre-trained BERT for Korean medical natural language processing (vol 12, 13847, 2022)
    Kim, Yoojoong
    Kim, Jong-Ho
    Lee, Jeong Moon
    Jang, Moon Joung
    Yum, Yun Jin
    Kim, Seongtae
    Shin, Unsub
    Kim, Young-Min
    Joo, Hyung Joon
    Song, Sanghoun
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [4] A Study of Pre-trained Language Models in Natural Language Processing
    Duan, Jiajia
    Zhao, Hui
    Zhou, Qian
    Qiu, Meikang
    Liu, Meiqin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121
  • [5] Pre-trained models for natural language processing: A survey
    Qiu XiPeng
    Sun TianXiang
    Xu YiGe
    Shao YunFan
    Dai Ning
    Huang XuanJing
    [J]. SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2020, 63 (10) : 1872 - 1897
  • [6] Pre-trained models for natural language processing: A survey
    QIU XiPeng
    SUN TianXiang
    XU YiGe
    SHAO YunFan
    DAI Ning
    HUANG XuanJing
    [J]. Science China Technological Sciences, 2020, 63 (10) : 1872 - 1897
  • [7] Pre-trained models for natural language processing: A survey
    QIU XiPeng
    SUN TianXiang
    XU YiGe
    SHAO YunFan
    DAI Ning
    HUANG XuanJing
    [J]. Science China(Technological Sciences), 2020, (10) - 1897
  • [8] Pre-trained models for natural language processing: A survey
    XiPeng Qiu
    TianXiang Sun
    YiGe Xu
    YunFan Shao
    Ning Dai
    XuanJing Huang
    [J]. Science China Technological Sciences, 2020, 63 : 1872 - 1897
  • [9] Revisiting Pre-trained Models for Chinese Natural Language Processing
    Cui, Yiming
    Che, Wanxiang
    Liu, Ting
    Qin, Bing
    Wang, Shijin
    Hu, Guoping
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 657 - 668
  • [10] Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Processing
    Huawei Technologies Co., Ltd.
    不详
    不详
    [J]. Proc. Conf. Empir. Methods Nat. Lang. Process., EMNLP, (3135-3151):