Unlocking the complete blood count as a risk stratification tool for breast cancer using machine learning: a large scale retrospective study

被引:2
|
作者
Araujo, Daniella Castro [1 ,4 ]
Rocha, Bruno Aragao [2 ]
Gomes, Karina Braga [3 ]
da Silva, Daniel Noce [1 ]
Ribeiro, Vinicius Moura [1 ]
Kohara, Marco Aurelio [1 ]
Marana, Fernanda Tostes [1 ]
Bitar, Renata Andrade [1 ]
Veloso, Adriano Alonso [4 ]
Pintao, Maria Carolina [2 ]
da Silva, Flavia Helena [2 ]
Viana, Celso Ferraz [2 ]
de Souza, Pedro Henrique Araujo [1 ,5 ]
da Silva, Ismael Dale Cotrim Guerreiro [2 ,6 ]
机构
[1] Huna, Sao Paulo, Brazil
[2] Grp Fleury, Sao Paulo, Brazil
[3] Univ Fed Minas Gerais UFMG, Fac Farm, Dept Anal Clin & Toxicol, Campus Belo Horizonte, Belo Horizonte, MG, Brazil
[4] Univ Fed Minas Gerais UFMG, Dept Ciencias Computacao, Inst Ciencias Exatas, Campus Belo Horizonte, Belo Horizonte, MG, Brazil
[5] Inst Nacl Canc INCA, Dept Oncol Clin Res, Rio De Janeiro, Brazil
[6] Univ Fed Sao Paulo, Dept Gynecol, Escola Paulista Med, Sao Paulo, Brazil
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
基金
巴西圣保罗研究基金会;
关键词
Breast cancer; Screening; Machine learning; Risk stratification; Routine blood tests; CBC; NLR; RBC; CBC-ratios; Hemogram;
D O I
10.1038/s41598-024-61215-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Optimizing early breast cancer (BC) detection requires effective risk assessment tools. This retrospective study from Brazil showcases the efficacy of machine learning in discerning complex patterns within routine blood tests, presenting a globally accessible and cost-effective approach for risk evaluation. We analyzed complete blood count (CBC) tests from 396,848 women aged 40-70, who underwent breast imaging or biopsies within six months after their CBC test. Of these, 2861 (0.72%) were identified as cases: 1882 with BC confirmed by anatomopathological tests, and 979 with highly suspicious imaging (BI-RADS 5). The remaining 393,987 participants (99.28%), with BI-RADS 1 or 2 results, were classified as controls. The database was divided into modeling (including training and validation) and testing sets based on diagnostic certainty. The testing set comprised cases confirmed by anatomopathology and controls cancer-free for 4.5-6.5 years post-CBC. Our ridge regression model, incorporating neutrophil-lymphocyte ratio, red blood cells, and age, achieved an AUC of 0.64 (95% CI 0.64-0.65). We also demonstrate that these results are slightly better than those from a boosting machine learning model, LightGBM, plus having the benefit of being fully interpretable. Using the probabilistic output from this model, we divided the study population into four risk groups: high, moderate, average, and low risk, which obtained relative ratios of BC of 1.99, 1.32, 1.02, and 0.42, respectively. The aim of this stratification was to streamline prioritization, potentially improving the early detection of breast cancer, particularly in resource-limited environments. As a risk stratification tool, this model offers the potential for personalized breast cancer screening by prioritizing women based on their individual risk, thereby indicating a shift from a broad population strategy.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] A multimodal machine learning model for the stratification of breast cancer risk
    Qian, Xuejun
    Pei, Jing
    Han, Chunguang
    Liang, Zhiying
    Zhang, Gaosong
    Chen, Na
    Zheng, Weiwei
    Meng, Fanlun
    Yu, Dongsheng
    Chen, Yixuan
    Sun, Yiqun
    Zhang, Hanqi
    Qian, Wei
    Wang, Xia
    Er, Zhuoran
    Hu, Chenglu
    Zheng, Hui
    Shen, Dinggang
    NATURE BIOMEDICAL ENGINEERING, 2024, : 356 - 370
  • [2] Developing a Breast Cancer Risk Assessment Tool using a Machine Learning Technique
    Choi, Jeungok
    Jung, Hee-Tae
    NURSING RESEARCH, 2019, 68 (02) : E135 - E135
  • [3] Using machine learning tool in classification of breast cancer
    Abdel-Ilah, Layla
    Sahinbegovic, Hana
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MEDICAL AND BIOLOGICAL ENGINEERING 2017 (CMBEBIH 2017), 2017, 62 : 3 - 8
  • [4] Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data
    Hornbrook, Mark C.
    Goshen, Ran
    Choman, Eran
    O'Keeffe-Rosetti, Maureen
    Kinar, Yaron
    Liles, Elizabeth G.
    Rust, Kristal C.
    DIGESTIVE DISEASES AND SCIENCES, 2017, 62 (10) : 2719 - 2727
  • [5] Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data
    Mark C. Hornbrook
    Ran Goshen
    Eran Choman
    Maureen O’Keeffe-Rosetti
    Yaron Kinar
    Elizabeth G. Liles
    Kristal C. Rust
    Digestive Diseases and Sciences, 2017, 62 : 2719 - 2727
  • [6] Retrospective validation of a machine learning clinical decision support tool for myocardial infarction risk stratification
    Panchavati, Saarang
    Lam, Carson
    Zelin, Nicole S.
    Pellegrini, Emily
    Barnes, Gina
    Hoffman, Jana
    Garikipati, Anurag
    Calvert, Jacob
    Mao, Qingqing
    Das, Ritankar
    HEALTHCARE TECHNOLOGY LETTERS, 2021, 8 (06) : 139 - 147
  • [7] Risk Stratification with Extreme Learning Machine: A Retrospective Study on Emergency Department Patients
    Liu, Nan
    Cao, Jiuwen
    Koh, Zhi Xiong
    Pek, Pin Pin
    Ong, Marcus Eng Hock
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [8] Machine Learning-Based Prediction of Hemoglobinopathies Using Complete Blood Count Data
    Schipper, Anoeska
    Rutten, Matthieu
    van Gammeren, Adriaan
    Harteveld, Cornelis L.
    Urrechaga, Eloisa
    Weerkamp, Floor
    den Besten, Gijs
    Krabbe, Johannes
    Slomp, Jennichjen
    Schoonen, Lise
    Broeren, Maarten
    van Wijnen, Merel
    Huijskens, Mirelle J. A. J.
    Koopmann, Tamara
    van Ginneken, Bram
    Kusters, Ron
    Kurstjens, Steef
    CLINICAL CHEMISTRY, 2024, 70 (08) : 1064 - 1075
  • [9] Breast Cancer Risk Analysis using Machine Learning
    Adane, D. S.
    Kabra, Laxmikant
    Banode, Akansha
    Agrawal, Mansi
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (05): : 723 - 731
  • [10] Correction to: Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data
    Mark C. Hornbrook
    Ran Goshen
    Eran Choman
    Maureen O’Keeffe-Rosetti
    Yaron Kinar
    Elizabeth G. Liles
    Kristal C. Rust
    Digestive Diseases and Sciences, 2018, 63 : 270 - 270