RxBERT: Enhancing drug labeling text mining and analysis with AI language modeling

被引:4
|
作者
Wu, Leihong [1 ]
Gray, Magnus [1 ]
Dang, Oanh [2 ]
Xu, Joshua [1 ]
Fang, Hong [3 ]
Tong, Weida [1 ]
机构
[1] FDA, Div Bioinformat & Biostat, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
[2] FDA, Off Surveillance & Epidemiol, Ctr Drug Evaluat & Res, Silver Spring, MD 20993 USA
[3] FDA, Off Sci Coordinat, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
关键词
Artificial intelligence; natural language processing; language model; BERT; drug labeling; pharmacovigilance;
D O I
10.1177/15353702231220669
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
The US drug labeling document contains essential information on drug efficacy and safety, making it a crucial regulatory resource for Food and Drug Administration (FDA) drug reviewers. Due to its extensive volume and the presence of free-text, conventional text mining analysis have encountered challenges in processing these data. Recent advances in artificial intelligence (AI) for natural language processing (NLP) have provided an unprecedented opportunity to identify key information from drug labeling, thereby enhancing safety reviews and support for regulatory decisions. We developed RxBERT, a Bidirectional Encoder Representations from Transformers (BERT) model pretrained on FDA human prescription drug labeling documents for an enhanced application of drug labeling documents in both research and drug review. RxBERT was derived from BioBERT with further training on human prescription drug labeling documents. RxBERT was demonstrated in several tasks using regulatory datasets, including those involved in the National Institutes of Technology Text Analysis Challenge Dataset (NIST TAC dataset), the FDA Adverse Drug Event Evaluation Dataset (ADE Eval dataset), and the classification of texts from submission packages into labeling sections (US Drug Labeling dataset). For all these tasks, RxBERT reached 86.5 F1-scores in both TAC and ADE Eval classification, respectively, and prediction accuracy of 87% for the US Drug Labeling dataset. Overall, RxBERT was shown to be as competitive or have better performance compared to other NLP approaches such as BERT, BioBERT, etc. In summary, we developed RxBERT, a transformer-based model specific for drug labeling that outperformed the original BERT model. RxBERT has the potential to be used to assist research scientists and FDA reviewers to better process and utilize drug labeling information toward the advancement of drug effectiveness and safety for public health. This proof-of-concept study also demonstrated a potential pathway to customized large language models (LLMs) tailored to the sensitive regulatory documents for internal application.
引用
收藏
页码:1937 / 1943
页数:7
相关论文
共 50 条
  • [21] Can generative AI infer thinking style from language? Evaluating the utility of AI as a psychological text analysis tool
    Markowitz, David M.
    BEHAVIOR RESEARCH METHODS, 2024, 56 (04) : 3548 - 3559
  • [22] Cats' and dogs' welfare: text mining and topics modeling analysis of the scientific literature
    Adamakopoulou, Chrysa
    Benedetti, Beatrice
    Zappaterra, Martina
    Felici, Martina
    Masebo, Naod Thomas
    Previti, Annalisa
    Passantino, Annamaria
    Padalino, Barbara
    FRONTIERS IN VETERINARY SCIENCE, 2023, 10
  • [23] Impact of COVID-19: A Text Mining Analysis of Twitter Data in Spanish Language
    Osakwe, Zainab Toteh
    Cortes, Yamnia, I
    HISPANIC HEALTH CARE INTERNATIONAL, 2021, 19 (04) : 239 - 245
  • [24] Construction site accident analysis using text mining and natural language processing techniques
    Zhang, Fan
    Fleyeh, Hasan
    Wang, Xinru
    Lu, Minghui
    AUTOMATION IN CONSTRUCTION, 2019, 99 : 238 - 248
  • [25] Language interpretation in travel guidance platform: Text mining and sentiment analysis of TripAdvisor reviews
    Chu, Miao
    Chen, Yi
    Yang, Lin
    Wang, Junfang
    FRONTIERS IN PSYCHOLOGY, 2022, 13
  • [26] Drug repurposing for rheumatoid arthritis: Identification of new drug candidates via bioinformatics and text mining analysis
    Unal, Ulku
    Comertpay, Betul
    Demirtas, Talip Yasir
    Gov, Esra
    AUTOIMMUNITY, 2022, 55 (03) : 147 - 156
  • [27] Text Mining and Drug Discovery Analysis: A Comprehensive Approach to Investigate Diabetes- Induced
    Wang, Chenfeng
    Hu, Yihe
    Liang, Feng
    INTERNATIONAL JOURNAL OF MEDICAL SCIENCES, 2024, 21 (03): : 464 - 473
  • [28] DSEATM: drug set enrichment analysis uncovering disease mechanisms by biomedical text mining
    Luo, Zhi-Hui
    Zhu, Li-Da
    Wang, Ya-Min
    Qian, Sheng Hu
    Li, Menglu
    Zhang, Wen
    Chen, Zhen-Xia
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (04)
  • [29] Drug Discovery in Canine Pyometra Disease Identified by Text Mining and Microarray Data Analysis
    Wang, Xin
    Yu, Guohua
    BIOMED RESEARCH INTERNATIONAL, 2023, 2023
  • [30] AI-Driven Sentiment Trend Analysis: Enhancing Topic Modeling Interpretation with ChatGPT
    Alharbi, Abdulrahman
    Hai, Ameen Abdel
    Aljurbua, Rafaa
    Obradovic, Zoran
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, PT II, AIAI 2024, 2024, 712 : 3 - 17