RxBERT: Enhancing drug labeling text mining and analysis with AI language modeling

被引:4
|
作者
Wu, Leihong [1 ]
Gray, Magnus [1 ]
Dang, Oanh [2 ]
Xu, Joshua [1 ]
Fang, Hong [3 ]
Tong, Weida [1 ]
机构
[1] FDA, Div Bioinformat & Biostat, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
[2] FDA, Off Surveillance & Epidemiol, Ctr Drug Evaluat & Res, Silver Spring, MD 20993 USA
[3] FDA, Off Sci Coordinat, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
关键词
Artificial intelligence; natural language processing; language model; BERT; drug labeling; pharmacovigilance;
D O I
10.1177/15353702231220669
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
The US drug labeling document contains essential information on drug efficacy and safety, making it a crucial regulatory resource for Food and Drug Administration (FDA) drug reviewers. Due to its extensive volume and the presence of free-text, conventional text mining analysis have encountered challenges in processing these data. Recent advances in artificial intelligence (AI) for natural language processing (NLP) have provided an unprecedented opportunity to identify key information from drug labeling, thereby enhancing safety reviews and support for regulatory decisions. We developed RxBERT, a Bidirectional Encoder Representations from Transformers (BERT) model pretrained on FDA human prescription drug labeling documents for an enhanced application of drug labeling documents in both research and drug review. RxBERT was derived from BioBERT with further training on human prescription drug labeling documents. RxBERT was demonstrated in several tasks using regulatory datasets, including those involved in the National Institutes of Technology Text Analysis Challenge Dataset (NIST TAC dataset), the FDA Adverse Drug Event Evaluation Dataset (ADE Eval dataset), and the classification of texts from submission packages into labeling sections (US Drug Labeling dataset). For all these tasks, RxBERT reached 86.5 F1-scores in both TAC and ADE Eval classification, respectively, and prediction accuracy of 87% for the US Drug Labeling dataset. Overall, RxBERT was shown to be as competitive or have better performance compared to other NLP approaches such as BERT, BioBERT, etc. In summary, we developed RxBERT, a transformer-based model specific for drug labeling that outperformed the original BERT model. RxBERT has the potential to be used to assist research scientists and FDA reviewers to better process and utilize drug labeling information toward the advancement of drug effectiveness and safety for public health. This proof-of-concept study also demonstrated a potential pathway to customized large language models (LLMs) tailored to the sensitive regulatory documents for internal application.
引用
收藏
页码:1937 / 1943
页数:7
相关论文
共 50 条
  • [41] Improving Massive Open Online Courses Analysis by Applying Modeling and Text Mining: A Case Study
    Mate, Alejandro
    de Gregorio, Elisa
    Camara, Jose
    Trujillo, Juan
    ADVANCES IN CONCEPTUAL MODELING, ER 2013, 2014, 8697 : 29 - 38
  • [42] The diachronic change in linguistic positivity in the academic book reviewing of language studies: a text-mining analysis
    Liu, Xueying
    Zhu, Haoran
    SCIENTOMETRICS, 2025, 130 (01) : 133 - 157
  • [43] Enhancing the analysis of online product reviews to support product improvement: integrating text mining with quality function deployment
    Asadabadi, Mehdi Rajabi
    Saberi, Morteza
    Sadghiani, Nima Salehi
    Zwikael, Ofer
    Chang, Elizabeth
    JOURNAL OF ENTERPRISE INFORMATION MANAGEMENT, 2023, 36 (01) : 275 - 302
  • [44] Comparison of linguamatics and FDALabel natural language processing text-mining to identify information in the OVERDOSAGE section of tramadol drug labels
    Nzeako, Ihechiluru
    Burkhart, Keith
    CLINICAL TOXICOLOGY, 2023, 61 : 93 - 93
  • [45] Topical Analysis of Scientific Publications on Drug-Resistant Tuberculosis Using Bibliometric and Text Mining Techniques
    Mardaneh, Jalal
    Ahmadi, Reza
    Dastani, Meisam
    JOURNAL OF SCIENTOMETRIC RESEARCH, 2023, 12 (02) : 416 - 421
  • [46] Searching for essential genes and drug discovery in breast cancer and periodontitis via text mining and bioinformatics analysis
    Luo, Lan
    Zheng, Weijie
    Chen, Chuang
    Sun, Shengrong
    ANTI-CANCER DRUGS, 2021, 32 (10) : 1038 - 1045
  • [48] Investigating the Role of Nutrition in Enhancing Immunity During the COVID-19 Pandemic: Twitter Text-Mining Analysis
    Shankar, Kavitha
    Chandrasekaran, Ranganathan
    Venkata, Pruthvinath Jeripity
    Miketinas, Derek
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
  • [49] Exploring YouTube content creators' perspectives on generative AI in language learning: Insights through opinion mining and sentiment analysis
    Bal, Mazhar
    Kara Aydemir, Ayse Gul
    Coskun, Mustafa
    PLOS ONE, 2024, 19 (09):
  • [50] A Computational Text Mining-Guided Meta-Analysis Approach to Identify Potential Xerostomia Drug Targets
    Beckman, Micaela F.
    Brennan, Elizabeth J.
    Igba, Chika K.
    Brennan, Michael T.
    Mougeot, Farah B.
    Mougeot, Jean-Luc C.
    JOURNAL OF CLINICAL MEDICINE, 2022, 11 (05)