TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines

被引:1
|
作者
Hussain, Sadam [1 ]
Naseem, Usman [2 ]
Ali, Mansoor [1 ]
Avalos, Daly Betzabeth Avendano [3 ]
Cardona-Huerta, Servando [3 ]
Palomo, Beatriz Alejandra Bosques [1 ]
Tamez-Pena, Jose Gerardo [3 ]
机构
[1] Tecnol Monterrey, Sch Engn & Sci, Monterrey 64849, Nuevo Leon, Mexico
[2] Macquarie Univ, Sch Comp, Sydney, NSW 2109, Australia
[3] Tecnol Monterrey, Sch Med, Monterrey 64849, Nuevo Leon, Mexico
关键词
BI-RADS classification; Breast radiological reports; TF-IDF; Word2vec; NLP; ML; AUTOMATIC CLASSIFICATION; MRI;
D O I
10.1186/s12911-024-02717-7
中图分类号
R-058 [];
学科分类号
摘要
BackgroundRecently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification.BackgroundRecently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification.ResultsThe final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607).ConclusionIn this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] An approach to BI-RADS uncertainty levels classification via deep learning with transfer learning technique
    Medeiros, Aldisio
    Ohata, Elene F.
    Silva, Francisco H. S.
    Rego, Paulo A. L.
    Filho, Pedro Pedrosa R.
    2020 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS(CBMS 2020), 2020, : 603 - 608
  • [2] EXTRACTING BI-RADS FEATURES FROM MAMMOGRAPHY REPORTS IN CHINESE BASED ON MACHINE LEARNING
    Zhou, Mate
    Tang, Tinglong
    Lu, Ji
    Deng, Ziqing
    Xiao, Zhenzhen
    Sun, Shuifa
    Zhang, Jun
    Wu, Yirong
    JOURNAL OF FLOW VISUALIZATION AND IMAGE PROCESSING, 2021, 28 (02) : 55 - 68
  • [3] Transfer Learning and Fine Tuning in Mammogram BI-RADS Classification
    Falconi, Lenin G.
    Perez, Maria
    Aguilar, Wilbert G.
    Conci, Aura
    2020 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS(CBMS 2020), 2020, : 475 - 480
  • [4] Bootstrapping BI-RADS classification using large language models and transformers in breast magnetic resonance imaging reports
    Yuxin Liu
    Xiang Zhang
    Weiwei Cao
    Wenju Cui
    Tao Tan
    Yuqin Peng
    Jiayi Huang
    Zhen Lei
    Jun Shen
    Jian Zheng
    Visual Computing for Industry, Biomedicine, and Art, 8 (1)
  • [5] Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches
    Miao, Shumei
    Xu, Tingyu
    Wu, Yonghui
    Xie, Hui
    Wang, Jingqi
    Jing, Shenqi
    Zhang, Yaoyun
    Zhang, Xiaoliang
    Yang, Yinshuang
    Zhang, Xin
    Shan, Tao
    Wang, Li
    Xu, Hua
    Wang, Shui
    Liu, Yun
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2018, 119 : 17 - 21
  • [6] Combining modified hyper learning binary dragonfly algorithm and deep learning for BI-RADS classification of breast masses in mammograms
    Khanna, Priyanka
    Sahu, Mridu
    Singh, Bikesh Kumar
    Bhateja, Vikrant
    EXPERT SYSTEMS, 2022,
  • [7] Mammogram retrieval through machine learning within BI-RADS standards
    Wei, Chia-Hung
    Li, Yue
    Huang, Pai Jung
    JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (04) : 607 - 614
  • [8] Deep Learning for Describing Breast Ultrasound Images with BI-RADS Terms
    Carrilero-Mardones, Mikel
    Parras-Jurado, Manuela
    Nogales, Alberto
    Perez-Martin, Jorge
    Diez, Francisco Javier
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2024, 37 (06): : 2940 - 2954
  • [9] ICG: A Machine Learning Benchmark Dataset and Baselines for Inline Code Comments Generation Task
    Zhang, Xiaowei
    Chen, Lin
    Zou, Weiqin
    Cao, Yulu
    Ren, Hao
    Wang, Zhi
    Li, Yanhui
    Zhou, Yuming
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2024, 34 (02) : 331 - 356
  • [10] Radioport: a radiomics-reporting network for interpretable deep learning in BI-RADS classification of mammographic calcification
    Pang, Ting
    Wong, Jeannie Hsiu Ding
    Ng, Wei Lin
    Chan, Chee Seng
    Wang, Chang
    Zhou, Xuezhi
    Yu, Yi
    PHYSICS IN MEDICINE AND BIOLOGY, 2024, 69 (06):