Unlocking the complete blood count as a risk stratification tool for breast cancer using machine learning: a large scale retrospective study

被引:2
|
作者
Araujo, Daniella Castro [1 ,4 ]
Rocha, Bruno Aragao [2 ]
Gomes, Karina Braga [3 ]
da Silva, Daniel Noce [1 ]
Ribeiro, Vinicius Moura [1 ]
Kohara, Marco Aurelio [1 ]
Marana, Fernanda Tostes [1 ]
Bitar, Renata Andrade [1 ]
Veloso, Adriano Alonso [4 ]
Pintao, Maria Carolina [2 ]
da Silva, Flavia Helena [2 ]
Viana, Celso Ferraz [2 ]
de Souza, Pedro Henrique Araujo [1 ,5 ]
da Silva, Ismael Dale Cotrim Guerreiro [2 ,6 ]
机构
[1] Huna, Sao Paulo, Brazil
[2] Grp Fleury, Sao Paulo, Brazil
[3] Univ Fed Minas Gerais UFMG, Fac Farm, Dept Anal Clin & Toxicol, Campus Belo Horizonte, Belo Horizonte, MG, Brazil
[4] Univ Fed Minas Gerais UFMG, Dept Ciencias Computacao, Inst Ciencias Exatas, Campus Belo Horizonte, Belo Horizonte, MG, Brazil
[5] Inst Nacl Canc INCA, Dept Oncol Clin Res, Rio De Janeiro, Brazil
[6] Univ Fed Sao Paulo, Dept Gynecol, Escola Paulista Med, Sao Paulo, Brazil
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
基金
巴西圣保罗研究基金会;
关键词
Breast cancer; Screening; Machine learning; Risk stratification; Routine blood tests; CBC; NLR; RBC; CBC-ratios; Hemogram;
D O I
10.1038/s41598-024-61215-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Optimizing early breast cancer (BC) detection requires effective risk assessment tools. This retrospective study from Brazil showcases the efficacy of machine learning in discerning complex patterns within routine blood tests, presenting a globally accessible and cost-effective approach for risk evaluation. We analyzed complete blood count (CBC) tests from 396,848 women aged 40-70, who underwent breast imaging or biopsies within six months after their CBC test. Of these, 2861 (0.72%) were identified as cases: 1882 with BC confirmed by anatomopathological tests, and 979 with highly suspicious imaging (BI-RADS 5). The remaining 393,987 participants (99.28%), with BI-RADS 1 or 2 results, were classified as controls. The database was divided into modeling (including training and validation) and testing sets based on diagnostic certainty. The testing set comprised cases confirmed by anatomopathology and controls cancer-free for 4.5-6.5 years post-CBC. Our ridge regression model, incorporating neutrophil-lymphocyte ratio, red blood cells, and age, achieved an AUC of 0.64 (95% CI 0.64-0.65). We also demonstrate that these results are slightly better than those from a boosting machine learning model, LightGBM, plus having the benefit of being fully interpretable. Using the probabilistic output from this model, we divided the study population into four risk groups: high, moderate, average, and low risk, which obtained relative ratios of BC of 1.99, 1.32, 1.02, and 0.42, respectively. The aim of this stratification was to streamline prioritization, potentially improving the early detection of breast cancer, particularly in resource-limited environments. As a risk stratification tool, this model offers the potential for personalized breast cancer screening by prioritizing women based on their individual risk, thereby indicating a shift from a broad population strategy.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Comparative Study of Machine Learning Algorithms using a Breast Cancer Dataset
    El-Shair, Zaid A.
    Sanchez-Perez, Luis A.
    Rawashdeh, Samir A.
    2020 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2020, : 500 - 508
  • [32] Breast Cancer Prediction: A Comparative Study Using Machine Learning Techniques
    Islam M.M.
    Haque M.R.
    Iqbal H.
    Hasan M.M.
    Hasan M.
    Kabir M.N.
    SN Computer Science, 2020, 1 (5)
  • [33] Machine-learning-based predictive classifier for bone marrow failure syndrome using complete blood count data
    Seo, Jeongmin
    Lee, Chansub
    Koh, Youngil
    Sun, Choong Hyun
    Lee, Jong-Mi
    An, Hong Yul
    Kim, Myungshin
    ISCIENCE, 2024, 27 (11)
  • [34] TubIAgnosis: A machine learning-based web application for active tuberculosis diagnosis using complete blood count data
    Ghermi, Mohamed
    Messedi, Meriam
    Adida, Chahira
    Belarbi, Kada
    Djazouli, Mohamed El Amine
    Berrazeg, Zahia Ibtissem
    Sellami, Maryam Kallel
    Ghezini, Younes
    Louati, Mahdi
    DIGITAL HEALTH, 2024, 10
  • [35] Risk Prediction of Emergency Department Visits in Patients With Lung Cancer Using Machine Learning: Retrospective Observational Study
    Lee, Ah Ra
    Park, Hojoon
    Yoo, Aram
    Kim, Seok
    Sunwoo, Leonard
    Yoo, Sooyoung
    JMIR MEDICAL INFORMATICS, 2023, 11
  • [36] Performing the ABC Method Twice for Gastric Cancer Risk Stratification: A Retrospective Study Based on Data from a Large-Scale Screening Facility
    Mizutani, Satoru
    Takahashi, Yu
    Shimamoto, Takeshi
    Nakagawa, Hideki
    Hisada, Hiroyuki
    Oshio, Kaori
    Kubota, Dai
    Mizutani, Hiroya
    Ohki, Daisuke
    Sakaguchi, Yoshiki
    Yakabi, Seiichi
    Niimi, Keiko
    Kakushima, Naomi
    Tsuji, Yosuke
    Wada, Ryoichi
    Yamamichi, Nobutake
    Fujishiro, Mitsuhiro
    DIAGNOSTICS, 2023, 13 (07)
  • [37] Predicting pathologic complete response to neoadjuvant chemotherapy in breast cancer using a machine learning approach
    Zhao, Fangyuan
    Polley, Eric
    Mcclellan, Julian
    Howard, Frederick
    Olopade, Olufunmilayo I.
    Huo, Dezheng
    BREAST CANCER RESEARCH, 2024, 26 (01)
  • [38] Using Machine Learning Models to Predict Pathologic Complete Response to Neoadjuvant Chemotherapy in Breast Cancer
    Rahadian, Rayhan Erlangga
    Tan, Hong Qi
    Ho, Bryan Shihan
    Kumaran, Arjunan
    Villanueva, Andre
    Sng, Joy
    Tan, Ryan Shea Ying Cong
    Tan, Tira Jing Ying
    Tan, Veronique Kiak Mien
    Tan, Benita Kiat Tee
    Lim, Geok Hoon
    Cai, Yiyu
    Nei, Wen Long
    Wong, Fuh Yong
    JCO CLINICAL CANCER INFORMATICS, 2024, 8
  • [39] Machine learning algorithms to uncover risk factors of breast cancer: insights from a large case-control study
    Dianati-Nasab, Mostafa
    Salimifard, Khodakaram
    Mohammadi, Reza
    Saadatmand, Sara
    Fararouei, Mohammad
    Hosseini, Kosar S.
    Jiavid-Sharifi, Behshid
    Chaussalet, Thierry
    Dehdar, Samira
    FRONTIERS IN ONCOLOGY, 2024, 13
  • [40] Development and validation of a pathogenomics model to improve the risk stratification of breast cancer: A deep learning study
    Ruichong, L.
    Wang, Z.
    Gu, Y.
    Ou, Q.
    Yu, C.
    Yu, Y.
    Su, W.
    Yao, H.
    ANNALS OF ONCOLOGY, 2022, 33 : S1433 - S1434