TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines

被引：1

作者：

Hussain, Sadam ^{[1
]}

Naseem, Usman ^{[2
]}

Ali, Mansoor ^{[1
]}

Avalos, Daly Betzabeth Avendano ^{[3
]}

Cardona-Huerta, Servando ^{[3
]}

Palomo, Beatriz Alejandra Bosques ^{[1
]}

Tamez-Pena, Jose Gerardo ^{[3
]}

机构：

[1] Tecnol Monterrey, Sch Engn & Sci, Monterrey 64849, Nuevo Leon, Mexico

[2] Macquarie Univ, Sch Comp, Sydney, NSW 2109, Australia

[3] Tecnol Monterrey, Sch Med, Monterrey 64849, Nuevo Leon, Mexico

来源：

BMC MEDICAL INFORMATICS AND DECISION MAKING | 2024年 / 24卷 / 01期

关键词：

BI-RADS classification; Breast radiological reports; TF-IDF; Word2vec; NLP; ML; AUTOMATIC CLASSIFICATION; MRI;

D O I：

10.1186/s12911-024-02717-7

中图分类号：

R-058 [];

学科分类号：

摘要：

BackgroundRecently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification.BackgroundRecently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification.ResultsThe final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607).ConclusionIn this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further.

引用

页数：10

共 50 条

[1] An approach to BI-RADS uncertainty levels classification via deep learning with transfer learning technique
Medeiros, Aldisio
Ohata, Elene F.
Silva, Francisco H. S.
Rego, Paulo A. L.
Filho, Pedro Pedrosa R.
2020 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS(CBMS 2020), 2020, : 603 - 608
[2] EXTRACTING BI-RADS FEATURES FROM MAMMOGRAPHY REPORTS IN CHINESE BASED ON MACHINE LEARNING
Zhou, Mate
Tang, Tinglong
Lu, Ji
Deng, Ziqing
Xiao, Zhenzhen
Sun, Shuifa
Zhang, Jun
Wu, Yirong
JOURNAL OF FLOW VISUALIZATION AND IMAGE PROCESSING, 2021, 28 (02) : 55 - 68
[3] Transfer Learning and Fine Tuning in Mammogram BI-RADS Classification
Falconi, Lenin G.
Perez, Maria
Aguilar, Wilbert G.
Conci, Aura
2020 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS(CBMS 2020), 2020, : 475 - 480
[4] Bootstrapping BI-RADS classification using large language models and transformers in breast magnetic resonance imaging reports
Yuxin Liu
Xiang Zhang
Weiwei Cao
Wenju Cui
Tao Tan
Yuqin Peng
Jiayi Huang
Zhen Lei
Jun Shen
Jian Zheng
Visual Computing for Industry, Biomedicine, and Art, 8 (1)
[5] Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches
Miao, Shumei
Xu, Tingyu
Wu, Yonghui
Xie, Hui
Wang, Jingqi
Jing, Shenqi
Zhang, Yaoyun
Zhang, Xiaoliang
Yang, Yinshuang
Zhang, Xin
Shan, Tao
Wang, Li
Xu, Hua
Wang, Shui
Liu, Yun
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2018, 119 : 17 - 21
[6] Combining modified hyper learning binary dragonfly algorithm and deep learning for BI-RADS classification of breast masses in mammograms
Khanna, Priyanka
Sahu, Mridu
Singh, Bikesh Kumar
Bhateja, Vikrant
EXPERT SYSTEMS, 2022,
[7] Mammogram retrieval through machine learning within BI-RADS standards
Wei, Chia-Hung
Li, Yue
Huang, Pai Jung
JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (04) : 607 - 614
[8] Deep Learning for Describing Breast Ultrasound Images with BI-RADS Terms
Carrilero-Mardones, Mikel
Parras-Jurado, Manuela
Nogales, Alberto
Perez-Martin, Jorge
Diez, Francisco Javier
JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2024, 37 (06): : 2940 - 2954
[9] ICG: A Machine Learning Benchmark Dataset and Baselines for Inline Code Comments Generation Task
Zhang, Xiaowei
Chen, Lin
Zou, Weiqin
Cao, Yulu
Ren, Hao
Wang, Zhi
Li, Yanhui
Zhou, Yuming
INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2024, 34 (02) : 331 - 356
[10] Radioport: a radiomics-reporting network for interpretable deep learning in BI-RADS classification of mammographic calcification
Pang, Ting
Wong, Jeannie Hsiu Ding
Ng, Wei Lin
Chan, Chee Seng
Wang, Chang
Zhou, Xuezhi
Yu, Yi
PHYSICS IN MEDICINE AND BIOLOGY, 2024, 69 (06):

← 1 2 3 4 5 →