Automated LOINC Standardization Using Pre-trained Large Language Models

被引:0
|
作者
Tu, Tao [1 ]
Loreaux, Eric [1 ]
Chesley, Emma [1 ]
Lelkes, Adam D. [1 ]
Gamble, Paul [1 ]
Bellaiche, Mathias [1 ]
Seneviratne, Martin [1 ]
Chen, Ming-Jun [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
来源
关键词
Large Language Model; T5; LOINC; Contrastive Learning; Sentence Embedding; Data Standardization; Medical Entity Linking; LABORATORY DATA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Harmonization of local source concepts to standard clinical terminologies is a prerequisite for multi-center data aggregation and sharing. Challenges in automating the mapping process stem from the idiosyncratic source encoding schemes adopted by different health systems and the lack of large publicly available training data. In this study, we aim to develop a scalable and generalizable machine learning tool to facilitate standardizing laboratory observations to the Logical Observation Identifiers Names and Codes (LOINC). Specifically, we leverage the contextual embedding from pre-trained T5 models and propose a two-stage fine-tuning strategy based on contrastive learning to enable learning in a few-shot setting without manual feature engineering. Our method utilizes unlabeled general LOINC ontology and data augmentation to achieve high accuracy on retrieving the most relevant LOINC targets when limited amount of labeled data are available. We further show that our model generalizes well to unseen targets. Taken together, our approach shows great potential to reduce manual effort in LOINC standardization and can be easily extended to mapping other terminologies.
引用
收藏
页码:343 / 355
页数:13
相关论文
共 50 条
  • [1] Automated Program Repair in the Era of Large Pre-trained Language Models
    Xia, Chunqiu Steven
    Wei, Yuxiang
    Zhang, Lingming
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 1482 - 1494
  • [2] Automated Assessment of Inferences Using Pre-Trained Language Models
    Yoo, Yongseok
    APPLIED SCIENCES-BASEL, 2024, 14 (09):
  • [3] Machine Unlearning of Pre-trained Large Language Models
    Yao, Jin
    Chien, Eli
    Du, Minxin
    Niu, Xinyao
    Wang, Tianhao
    Cheng, Zezhou
    Yue, Xiang
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8403 - 8419
  • [4] Emotional Paraphrasing Using Pre-trained Language Models
    Casas, Jacky
    Torche, Samuel
    Daher, Karl
    Mugellini, Elena
    Abou Khaled, Omar
    2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2021,
  • [5] Probing Toxic Content in Large Pre-Trained Language Models
    Ousidhoum, Nedjma
    Zhao, Xinran
    Fang, Tianqing
    Song, Yangqiu
    Yeung, Dit-Yan
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4262 - 4274
  • [6] Enhancing Domain Modeling with Pre-trained Large Language Models: An Automated Assistant for Domain Modelers
    Prokop, Dominik
    Stenchlak, Stepan
    Skoda, Petr
    Klimek, Jakub
    Necasky, Martin
    CONCEPTUAL MODELING, ER 2024, 2025, 15238 : 235 - 253
  • [7] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    ENGINEERING, 2023, 25 : 51 - 65
  • [8] Effective test generation using pre-trained Large Language Models and mutation testing
    Dakhel, Arghavan Moradi
    Nikanjam, Amin
    Majdinasab, Vahid
    Khomh, Foutse
    Desmarais, Michel C.
    INFORMATION AND SOFTWARE TECHNOLOGY, 2024, 171
  • [9] μBERT: Mutation Testing using Pre-Trained Language Models
    Degiovanni, Renzo
    Papadakis, Mike
    2022 IEEE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW 2022), 2022, : 160 - 169
  • [10] Devulgarization of Polish Texts Using Pre-trained Language Models
    Klamra, Cezary
    Wojdyga, Grzegorz
    Zurowski, Sebastian
    Rosalska, Paulina
    Kozlowska, Matylda
    Ogrodniczuk, Maciej
    COMPUTATIONAL SCIENCE, ICCS 2022, PT II, 2022, : 49 - 55