RxBERT: Enhancing drug labeling text mining and analysis with AI language modeling

被引:4
|
作者
Wu, Leihong [1 ]
Gray, Magnus [1 ]
Dang, Oanh [2 ]
Xu, Joshua [1 ]
Fang, Hong [3 ]
Tong, Weida [1 ]
机构
[1] FDA, Div Bioinformat & Biostat, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
[2] FDA, Off Surveillance & Epidemiol, Ctr Drug Evaluat & Res, Silver Spring, MD 20993 USA
[3] FDA, Off Sci Coordinat, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
关键词
Artificial intelligence; natural language processing; language model; BERT; drug labeling; pharmacovigilance;
D O I
10.1177/15353702231220669
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
The US drug labeling document contains essential information on drug efficacy and safety, making it a crucial regulatory resource for Food and Drug Administration (FDA) drug reviewers. Due to its extensive volume and the presence of free-text, conventional text mining analysis have encountered challenges in processing these data. Recent advances in artificial intelligence (AI) for natural language processing (NLP) have provided an unprecedented opportunity to identify key information from drug labeling, thereby enhancing safety reviews and support for regulatory decisions. We developed RxBERT, a Bidirectional Encoder Representations from Transformers (BERT) model pretrained on FDA human prescription drug labeling documents for an enhanced application of drug labeling documents in both research and drug review. RxBERT was derived from BioBERT with further training on human prescription drug labeling documents. RxBERT was demonstrated in several tasks using regulatory datasets, including those involved in the National Institutes of Technology Text Analysis Challenge Dataset (NIST TAC dataset), the FDA Adverse Drug Event Evaluation Dataset (ADE Eval dataset), and the classification of texts from submission packages into labeling sections (US Drug Labeling dataset). For all these tasks, RxBERT reached 86.5 F1-scores in both TAC and ADE Eval classification, respectively, and prediction accuracy of 87% for the US Drug Labeling dataset. Overall, RxBERT was shown to be as competitive or have better performance compared to other NLP approaches such as BERT, BioBERT, etc. In summary, we developed RxBERT, a transformer-based model specific for drug labeling that outperformed the original BERT model. RxBERT has the potential to be used to assist research scientists and FDA reviewers to better process and utilize drug labeling information toward the advancement of drug effectiveness and safety for public health. This proof-of-concept study also demonstrated a potential pathway to customized large language models (LLMs) tailored to the sensitive regulatory documents for internal application.
引用
收藏
页码:1937 / 1943
页数:7
相关论文
共 50 条
  • [31] Canine Bacterial Endocarditis: A Text Mining and Topics Modeling Analysis as an Approach for a Systematic Review
    Previti, Annalisa
    Biondi, Vito
    Passantino, Annamaria
    Or, Mehmet Erman
    Pugliese, Michela
    MICROORGANISMS, 2024, 12 (06)
  • [32] Investigating various cryptocurrency research trends: an analysis employing text mining and topic modeling
    Singh, Amrinder
    Trivedi, Shrawan Kumar
    Vishnu, Sriranga
    Harigaran, T.
    Zhang, Justin Zuopeng
    GLOBAL KNOWLEDGE MEMORY AND COMMUNICATION, 2024,
  • [33] Modeling and analysis of identity threat behaviors through text mining of identity theft stories
    Zaeem, Razieh Nokhbeh
    Manoharan, Monisha
    Yang, Yongpeng
    Barber, K. Suzanne
    COMPUTERS & SECURITY, 2017, 65 : 50 - 63
  • [34] A Bibliometric Analysis of Text Mining: Exploring the Use of Natural Language Processing in Social Media Research
    Sandu, Andra
    Cotfas, Liviu-Adrian
    Stanescu, Aurelia
    Delcea, Camelia
    APPLIED SCIENCES-BASEL, 2024, 14 (08):
  • [35] PashtoEmo: Enhancing Text-Based Emotion Analysis in the Pashto Language Through Dataset Creation
    Payendal, Mohammad Arif
    Vahidi, Abdul Razaq
    Hussiny, Mohammad Ali
    Prinzl, Andreas
    Ovrelid, Lilja
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 225 - 236
  • [36] An Unsupervised Topic Modeling Approach for Adverse Drug Reaction Extraction and Identification from Natural Language Text
    Joshi, Chaitali
    Attar, Vahida Z.
    Kalamkar, Shrida P.
    ADVANCES IN DATA AND INFORMATION SCIENCES, 2022, 318 : 505 - 514
  • [37] Topic Modeling and Sentiment Analysis of US’ Afghan Exit Twitter Data: A Text Mining Approach
    Clement A.P.
    Kofi A.S.
    Maxwell D.J.
    Caleb D.J.
    Dela T.V.
    Kofi D.N.
    Dodzi F.D.
    Juliana N.
    International Journal of Information and Management Sciences, 2023, 34 (01): : 51 - 64
  • [38] Unlocking insights: integrated text mining and interpretive structural modeling for enhanced user review analysis
    Li, Na
    Liu, Yu-Tao
    Chen, Zhan
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [39] Risk Factors Analysis Modeling for Ship Collision Accident in Inland River Based on Text Mining
    Shi, Shaoyue
    Zhang, Danhong
    Su, Yixin
    Zhang, Mingyang
    Sun, Miaoyun
    Yao, Houjie
    2019 5TH INTERNATIONAL CONFERENCE ON TRANSPORTATION INFORMATION AND SAFETY (ICTIS 2019), 2019, : 602 - 607
  • [40] Research trends on Big Data in Marketing: A text mining and topic modeling based literature analysis
    Amado, Alexandra
    Cortez, Paulo
    Rita, Paulo
    Moro, Sergio
    EUROPEAN RESEARCH ON MANAGEMENT AND BUSINESS ECONOMICS, 2018, 24 (01) : 1 - 7