TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines

被引:1
|
作者
Hussain, Sadam [1 ]
Naseem, Usman [2 ]
Ali, Mansoor [1 ]
Avalos, Daly Betzabeth Avendano [3 ]
Cardona-Huerta, Servando [3 ]
Palomo, Beatriz Alejandra Bosques [1 ]
Tamez-Pena, Jose Gerardo [3 ]
机构
[1] Tecnol Monterrey, Sch Engn & Sci, Monterrey 64849, Nuevo Leon, Mexico
[2] Macquarie Univ, Sch Comp, Sydney, NSW 2109, Australia
[3] Tecnol Monterrey, Sch Med, Monterrey 64849, Nuevo Leon, Mexico
关键词
BI-RADS classification; Breast radiological reports; TF-IDF; Word2vec; NLP; ML; AUTOMATIC CLASSIFICATION; MRI;
D O I
10.1186/s12911-024-02717-7
中图分类号
R-058 [];
学科分类号
摘要
BackgroundRecently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification.BackgroundRecently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification.ResultsThe final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607).ConclusionIn this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Multimodal multitask similarity learning for vision language model on radiological images and reports
    Yu, Yang
    Wang, Jiahao
    Liu, Weide
    Mien, Ivan Ho
    Krishnaswamy, Pavitra
    Yang, Xulei
    Cheng, Jun
    NEUROCOMPUTING, 2025, 636
  • [42] ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning
    Kaltenborn, Julia
    Lange, Charlotte Emilie Elektra
    Ramesh, Venkatesh
    Brouillard, Philippe
    Gurwicz, Yaniv
    Nagda, Chandni
    Runge, Jakob
    Nowack, Peer
    Rolnick, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [43] A multicenter study validated an integrated deep learning model for precision malignancy risk assessment and reducing unnecessary biopsies in BI-RADS 4 cases
    Ezeana, Chika F.
    He, Tiancheng
    Patel, Tejal A.
    Kaklamani, Virginia
    Elmi, Maryam
    Ibarra, Erica
    Otto, Pamela M.
    Kist, Kenneth A.
    Speck, Heather
    Wang, Lin
    Ensor, Joe
    Shih, Ya-Chen T.
    Kim, Bumyang
    Pan, I-Wen
    Spak, David
    Yang, Wei T.
    Chang, Jenny C.
    Wong, Stephen T.
    CANCER RESEARCH, 2023, 83 (07)
  • [44] ADDED VALUE OF QUANTITATIVE ULTRASOUND AND MACHINE LEARNING IN BI-RADS 4-5 ASSESSMENT OF SOLID BREAST LESIONS
    Destrempes, Francois
    Trop, Isabelle
    Allard, Louise
    Chayer, Boris
    Garcia-Duitama, Julian
    El Khoury, Mona
    Lalonde, Lucie
    Cloutier, Guy
    ULTRASOUND IN MEDICINE AND BIOLOGY, 2020, 46 (02): : 436 - 444
  • [45] Real-time deployment of BI-RADS breast cancer classifier using deep-learning and FPGA techniques
    Maria, H. Heartlin
    Kayalvizhi, R.
    Malarvizhi, S.
    Venkatraman, Revathi
    Patil, Shantanu
    Kumar, A. Senthil
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2023, 20 (04)
  • [46] Real-time deployment of BI-RADS breast cancer classifier using deep-learning and FPGA techniques
    H. Heartlin Maria
    R. Kayalvizhi
    S. Malarvizhi
    Revathi Venkatraman
    Shantanu Patil
    A. Senthil Kumar
    Journal of Real-Time Image Processing, 2023, 20
  • [47] APPLICATION OF DEEP LEARNING TO REDUCE THE RATE OF MALIGNANCY AMONG BI-RADS 4A BREAST LESIONS BASED ON ULTRASONOGRAPHY
    Zhao, Zhijin
    Hou, Size
    LI, Shuang
    Sheng, Danli
    Liu, Qi
    Chang, Cai
    Chen, Jiangang
    LI, Jiawei
    ULTRASOUND IN MEDICINE AND BIOLOGY, 2022, 48 (11): : 2267 - 2275
  • [48] A Deep Learning Decision Support Tool to Improve Risk Stratification and Reduce Unnecessary Biopsies in BI-RADS 4 Mammograms
    Ezeana, Chika F.
    He, Tiancheng
    Patel, Tejal A.
    Kaklamani, Virginia
    Elmi, Maryam
    Brigmon, Erika
    Otto, Pamela M.
    Kist, Kenneth A.
    Speck, Heather
    Wang, Lin
    Ensor, Joe
    Shih, Ya-Chen T.
    Kim, Bumyang
    Pan, I. -Wen
    Cohen, Adam L.
    Kelley, Kristen
    Spak, David
    Yang, Wei T.
    Chang, Jenny C.
    Wong, Stephen T. C.
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2023, 5 (06)
  • [49] Land Use/Land Cover Classification Using Machine Learning and Deep Learning Algorithms for EuroSAT Dataset - A Review
    Loganathan, Agilandeeswari
    Koushmitha, Suri
    Arun, Yerru Nanda Krishna
    INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 1363 - 1374
  • [50] Document Language Classification: Hierarchical Model with Deep Learning Approach
    Shah, Sarathi
    Joshi, M.V.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2021, 13052 LNCS : 372 - 381