EMBERT: A Pre-trained Language Model for Chinese Medical Text Mining

被引：3

作者：

Cai, Zerui ^{[1
]}

Zhang, Taolin ^{[2
,3
]}

Wang, Chengyu ^{[3
]}

He, Xiaofeng ^{[1
]}

机构：

[1] East China Normal Univ, Sch Comp Sci & Technol, Shanghai, Peoples R China

[2] East China Normal Univ, Sch Software Engn, Shanghai, Peoples R China

[3] Alibaba Grp, Hangzhou, Peoples R China

来源：

WEB AND BIG DATA, APWEB-WAIM 2021, PT I | 2021年 / 12858卷

关键词：

Pre-trained language model; Chinese medical text mining; Self-supervised learning; Deep context-aware neural network; IDENTIFICATION;

D O I：

10.1007/978-3-030-85896-4_20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Medical text mining aims to learn models to extract useful information from medical sources. A major challenge is obtaining large-scale labeled data in the medical domain for model training, which is highly expensive. Recent studies show that leveraging massive unlabeled corpora for pre-training language models alleviates this problem by selfsupervised learning. In this paper, we propose EMBERT, an entity-level knowledge-enhanced pre-trained language model, which leverages several distinct self-supervised tasks for Chinese medical text mining. EMBERT captures fine-grained semantic relations among medical terms by three self-supervised tasks, including i) context-entity consistency prediction (whether entities are of equivalence in meanings given certain contexts), ii) entity segmentation (segmenting entities into fine-grained semantic parts) and iii) bidirectional entity masking (predicting the atomic or adjective terms of long entities). The experimental results demonstrate that our model achieves significant improvements over five strong baselines on six public Chinese medical text mining datasets.

引用

页码：242 / 257

页数：16

共 50 条

[1] BioHanBERT: A Hanzi-aware Pre-trained Language Model for Chinese Biomedical Text Mining
Wang, Xiaosu
Xiong, Yun
Niu, Hao
Yue, Jingwen
Zhu, Yangyong
Yu, Philip S.
[J]. 2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1415 - 1420
[2] Using a Pre-Trained Language Model for Medical Named Entity Extraction in Chinese Clinic Text
Zhang, Mengyuan
Wang, Jin
Zhang, Xuejie
[J]. PROCEEDINGS OF 2020 IEEE 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2020), 2020, : 312 - 317
[3] BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Lee, Jinhyuk
Yoon, Wonjin
Kim, Sungdong
Kim, Donghyeon
Kim, Sunkyu
So, Chan Ho
Kang, Jaewoo
[J]. BIOINFORMATICS, 2020, 36 (04) : 1234 - 1240
[4] BioVAE: a pre-trained latent variable language model for biomedical text mining
Trieu, Hai-Long
Miwa, Makoto
Ananiadou, Sophia
[J]. BIOINFORMATICS, 2022, 38 (03) : 872 - 874
[5] FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining
Liu, Zhuang
Huang, Degen
Huang, Kaiyu
Li, Zhuang
Zhao, Jun
[J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 4513 - 4519
[6] Improving text mining in plant health domain with GAN and/or pre-trained language model
Jiang, Shufan
Cormier, Stephane
Angarita, Rafael
Rousseaux, Francis
[J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6
[7] ViHealthBERT: Pre-trained Language Models for Vietnamese in Health Text Mining
Minh Phuc Nguyen
Vu Hoang Tran
Vu Hoang
Ta Duc Huy
Bui, Trung H.
Truong, Steven Q. H.
[J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 328 - 337
[8] A Pre-trained Model for Chinese Medical Record Punctuation Restoration
Yu, Zhipeng
Ling, Tongtao
Gu, Fangqing
Sheng, Huangxu
Liu, Yi
[J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 101 - 112
[9] RoBERTuito: a pre-trained language model for social media text in Spanish
Manuel Perez, Juan
Furman, Damian A.
Alonso Alemany, Laura
Luque, Franco
[J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7235 - 7243
[10] Leveraging Pre-Trained Language Model for Summary Generation on Short Text
Zhao, Shuai
You, Fucheng
Liu, Zeng Yuan
[J]. IEEE ACCESS, 2020, 8 : 228798 - 228803

← 1 2 3 4 5 →