Vocabulary Modifications for Domain-adaptive Pretraining of Clinical Language Models

被引:0
|
作者
Lamproudis, Anastasios [1 ]
Henriksson, Aron [1 ]
Dalianis, Hercules [1 ]
机构
[1] Stockholm Univ, Dept Comp & Syst Sci, Stockholm, Sweden
关键词
Natural Language Processing; Language Models; Domain-adaptive Pretraining; Clinical Text; Swedish;
D O I
10.5220/0010893800003123
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Research has shown that using generic language models - specifically, BERT models - in specialized domains may be sub-optimal due to domain differences in language use and vocabulary. There are several techniques for developing domain-specific language models that leverage the use of existing generic language models, including continued and domain-adaptive pretraining with in-domain data. Here, we investigate a strategy based on using a domain-specific vocabulary, while leveraging a generic language model for initialization. The results demonstrate that domain-adaptive pretraining, in combination with a domain-specific vocabulary - as opposed to a general-domain vocabulary - yields improvements on two downstream clinical NLP tasks for Swedish. The results highlight the value of domain-adaptive pretraining when developing specialized language models and indicate that it is beneficial to adapt the vocabulary of the language model to the target domain prior to continued, domain-adaptive pretraining of a generic language model.
引用
下载
收藏
页码:180 / 188
页数:9
相关论文
共 50 条
  • [41] Adaptive Language Modeling with a Set of Domain Dependent Models
    Shi, Yangyang
    Wiggers, Pascal
    Jonker, Catholijn M.
    TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 472 - 479
  • [42] MDAPT: Multilingual Domain Adaptive Pretraining in a Single Model
    Jorgensen, Rasmus Kaer
    Hartmann, Mareike
    Dai, Xiang
    Elliott, Desmond
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3404 - 3418
  • [43] Functional Subspace Variational Autoencoder for Domain-Adaptive Fault Diagnosis
    Li, Tan
    Fung, Che-Heng
    Wong, Him-Ting
    Chan, Tak-Lam
    Hu, Haibo
    MATHEMATICS, 2023, 11 (13)
  • [44] Energy-Based Domain-Adaptive Segmentation With Depth Guidance
    Zhu, Jinjing
    Hu, Zhedong
    Kim, Tae-Kyun
    Wang, Lin
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (08): : 7126 - 7133
  • [45] Revisiting Domain-Adaptive Semantic Segmentation via Knowledge Distillation
    Jeong, Seongwon
    Kim, Jiyeong
    Kim, Sungheui
    Min, Dongbo
    IEEE Transactions on Image Processing, 2024, 33 : 6761 - 6773
  • [46] Domain-Adaptive Person Search With Diverse Images and Instance Augmentation
    Dong Zhiqiang
    Cao Jiale
    Yang Aiping
    LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (14)
  • [47] Domain-adaptive Solar Cell Surface Defect Detection Algorithm
    Zhang, Xiong
    Li, Ran-Ran
    Hou, Ting
    Shangguan, Hong
    Wu, Xiao-Jia
    Ning, Ai-Ping
    Wang, An-Hong
    Journal of Network Intelligence, 2024, 9 (04): : 2588 - 2604
  • [48] Domain-Adaptive Vision Transformers for Generalizing Across Visual Domains
    Cho, Yunsung
    Yun, Jungmin
    Kwon, Junehyoung
    Kim, Youngbin
    IEEE ACCESS, 2023, 11 : 115644 - 115653
  • [49] Semi-Supervised QA with Generative Domain-Adaptive Nets
    Yang, Zhilin
    Hu, Junjie
    Salakhutdinov, Ruslan
    Cohen, William W.
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1040 - 1050
  • [50] Domain-Adaptive Fall Detection Using Deep Adversarial Training
    Liu, Kai-Chun
    Chan, Michael
    Kuo, Heng-Cheng
    Hsieh, Chia-Yeh
    Huang, Hsiang-Yun
    Chan, Chia-Tai
    Tsao, Yu
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2021, 29 : 1243 - 1251