TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines

被引:1
|
作者
Hussain, Sadam [1 ]
Naseem, Usman [2 ]
Ali, Mansoor [1 ]
Avalos, Daly Betzabeth Avendano [3 ]
Cardona-Huerta, Servando [3 ]
Palomo, Beatriz Alejandra Bosques [1 ]
Tamez-Pena, Jose Gerardo [3 ]
机构
[1] Tecnol Monterrey, Sch Engn & Sci, Monterrey 64849, Nuevo Leon, Mexico
[2] Macquarie Univ, Sch Comp, Sydney, NSW 2109, Australia
[3] Tecnol Monterrey, Sch Med, Monterrey 64849, Nuevo Leon, Mexico
关键词
BI-RADS classification; Breast radiological reports; TF-IDF; Word2vec; NLP; ML; AUTOMATIC CLASSIFICATION; MRI;
D O I
10.1186/s12911-024-02717-7
中图分类号
R-058 [];
学科分类号
摘要
BackgroundRecently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification.BackgroundRecently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification.ResultsThe final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607).ConclusionIn this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Automated classification of acute leukemia on a heterogeneous dataset using machine learning and deep learning techniques
    Abhishek, Arjun
    Jha, Rajib Kumar
    Sinha, Ruchi
    Jha, Kamlesh
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 72
  • [22] A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning
    Mundhenk, T. Nathan
    Konjevod, Goran
    Sakla, Wesam A.
    Boakye, Kofi
    COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 785 - 800
  • [23] A text classification network model combining machine learning and deep learning
    Chen, Hao
    Zhang, Haifei
    Yang, Yuwei
    He, Long
    INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2024, 44 (03) : 182 - 192
  • [24] Segmentation and Classification of Breast Masses From the Whole Mammography Images Using Transfer Learning and BI-RADS Characteristics
    Oudjer, Hayette
    Cherfa, Assia
    Cherfa, Yazid
    Belkhamsa, Noureddine
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (06)
  • [25] Performance of machine learning software to classify breast lesions using BI-RADS radiomic features on ultrasound images
    Eduardo Fleury
    Karem Marcomini
    European Radiology Experimental, 3
  • [26] Performance of machine learning software to classify breast lesions using BI-RADS radiomic features on ultrasound images
    Fleury, Eduardo
    Marcomini, Karem
    EUROPEAN RADIOLOGY EXPERIMENTAL, 2019, 3 (01) : 34
  • [27] COMPUTER-AIDED DIAGNOSIS FOR BREAST ULTRASOUND USING COMPUTERIZED BI-RADS FEATURES AND MACHINE LEARNING METHODS
    Shan, Juan
    Alam, S. Kaisar
    Garra, Brian
    Zhang, Yingtao
    Ahmed, Tahira
    ULTRASOUND IN MEDICINE AND BIOLOGY, 2016, 42 (04): : 980 - 988
  • [28] Development of a Deep Learning Natural Language Processing Model for Classification of Lung Cancer Radiology Reports
    Mithun, S.
    Jha, A. K.
    Sherkhane, U. B.
    Jaiswar, V.
    Nautiyal, A.
    Purandare, N. C.
    Rangarajan, V.
    Dekker, A.
    Wee, L.
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2021, 48 (SUPPL 1) : S330 - S330
  • [29] Reducing the number of unnecessary biopsies for mammographic BI-RADS 4 lesions through a deep transfer learning method
    Mingzhu Meng
    Hong Li
    Ming Zhang
    Guangyuan He
    Long Wang
    Dong Shen
    BMC Medical Imaging, 23
  • [30] Benchmark on a large cohort for sleep-wake classification with machine learning techniques
    Joao Palotti
    Raghvendra Mall
    Michael Aupetit
    Michael Rueschman
    Meghna Singh
    Aarti Sathyanarayana
    Shahrad Taheri
    Luis Fernandez-Luque
    npj Digital Medicine, 2