RxBERT: Enhancing drug labeling text mining and analysis with AI language modeling

被引:4
|
作者
Wu, Leihong [1 ]
Gray, Magnus [1 ]
Dang, Oanh [2 ]
Xu, Joshua [1 ]
Fang, Hong [3 ]
Tong, Weida [1 ]
机构
[1] FDA, Div Bioinformat & Biostat, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
[2] FDA, Off Surveillance & Epidemiol, Ctr Drug Evaluat & Res, Silver Spring, MD 20993 USA
[3] FDA, Off Sci Coordinat, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
关键词
Artificial intelligence; natural language processing; language model; BERT; drug labeling; pharmacovigilance;
D O I
10.1177/15353702231220669
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
The US drug labeling document contains essential information on drug efficacy and safety, making it a crucial regulatory resource for Food and Drug Administration (FDA) drug reviewers. Due to its extensive volume and the presence of free-text, conventional text mining analysis have encountered challenges in processing these data. Recent advances in artificial intelligence (AI) for natural language processing (NLP) have provided an unprecedented opportunity to identify key information from drug labeling, thereby enhancing safety reviews and support for regulatory decisions. We developed RxBERT, a Bidirectional Encoder Representations from Transformers (BERT) model pretrained on FDA human prescription drug labeling documents for an enhanced application of drug labeling documents in both research and drug review. RxBERT was derived from BioBERT with further training on human prescription drug labeling documents. RxBERT was demonstrated in several tasks using regulatory datasets, including those involved in the National Institutes of Technology Text Analysis Challenge Dataset (NIST TAC dataset), the FDA Adverse Drug Event Evaluation Dataset (ADE Eval dataset), and the classification of texts from submission packages into labeling sections (US Drug Labeling dataset). For all these tasks, RxBERT reached 86.5 F1-scores in both TAC and ADE Eval classification, respectively, and prediction accuracy of 87% for the US Drug Labeling dataset. Overall, RxBERT was shown to be as competitive or have better performance compared to other NLP approaches such as BERT, BioBERT, etc. In summary, we developed RxBERT, a transformer-based model specific for drug labeling that outperformed the original BERT model. RxBERT has the potential to be used to assist research scientists and FDA reviewers to better process and utilize drug labeling information toward the advancement of drug effectiveness and safety for public health. This proof-of-concept study also demonstrated a potential pathway to customized large language models (LLMs) tailored to the sensitive regulatory documents for internal application.
引用
收藏
页码:1937 / 1943
页数:7
相关论文
共 50 条
  • [1] Use of natural language processing text-mining for identification of similarities and differences in the "overdosage" section of drug labeling
    Ashraf, Adrita
    Brodsky, Eric
    Burkhart, Keith
    CLINICAL TOXICOLOGY, 2021, 59 (11) : 1097 - 1097
  • [2] Utilizing Text Mining for Labeling Training Models from Futures Corpus in Generative AI
    Chou, Hsien-Ming
    Cho, Tsai-Lun
    APPLIED SCIENCES-BASEL, 2023, 13 (17):
  • [3] A language modeling text mining approach to the annotation of protein community
    Zhang, Xiaodan
    Wu, Daniel D.
    Zhou, Xiaohua
    Hu, Xiaohua
    BIBE 2006: SIXTH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, PROCEEDINGS, 2006, : 12 - +
  • [4] Natural language processing in text mining for structural modeling of protein complexes
    Varsha D. Badal
    Petras J. Kundrotas
    Ilya A. Vakser
    BMC Bioinformatics, 19
  • [5] Natural language processing in text mining for structural modeling of protein complexes
    Badal, Varsha D.
    Kundrotas, Petras J.
    Vakser, Ilya A.
    BMC BIOINFORMATICS, 2018, 19
  • [6] Use of AI and Text Mining on Twitter for the Analysis of the Concept of Tourism in Colombia
    Hernandez-Riano, Javier
    Casadiego-Alzate, Rodolfo
    Sanchez-Torres, Javier A.
    Arroyo-Canada, Francisco-Javier
    Argila-Irurita, Ana Maria
    Sole-Moro, Maria Luisa
    ADVANCES IN DIGITAL MARKETING AND ECOMMERCE, DMEC 2024, 2024, : 209 - 224
  • [7] Text Mining and Data Modeling of Karyotypes to aid in Drug Repurposing Efforts
    Abrams, Zachary B.
    Peabody, Andrea L.
    Heerema, Nyla A.
    Payne, Philip R. O.
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 1037 - 1037
  • [8] THE APPLICATION OF TEXT MINING AND STATISTICAL METHODS IN THE ANALYSIS OF EVALUATIVE LANGUAGE
    Leska, Magdalena
    Lacka-Badura, Jolanta
    Trzesiok, Joanna
    ENGLISH AS THE LINGUA FRANCA OF THE MODERN WORLD: NEW CHALLENGES FOR ACADEMIA, 2013, : 55 - 69
  • [9] Language modeling for spontaneous speech recognition based on disfluency labeling and generation of disfluent text
    Horii, Koharu
    Ohta, Kengo
    Nishimura, Ryota
    Ogawa, Atsunori
    Kitaoka, Norihide
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1851 - 1856
  • [10] Drug repurposing: A bibliometric analysis by text-mining PubMed
    Baker, Nancy
    Ekins, Sean
    Williams, Antony
    Tropsha, Alexander
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 253