Feature Selection and Comparative Analysis of Breast Cancer Prediction Using Clinical Data and Histopathological Whole Slide Images

被引:0
|
作者
Mohammed, Sarfaraz Ahmed [1 ]
Abeysinghe, Senuka [2 ]
Ralescu, Anca [1 ]
机构
[1] Univ Cincinnati, Dept Comp Sci, Cincinnati, OH 45221 USA
[2] Indian Hill High Sch, Ohios Coll, Credit Plus Program, Cincinnati, OH 45243 USA
关键词
Breast cancer; Machine learning; Principal component analysis; Particle swarm optimization; Feature selection; Logistic regression; Na & iuml; ve bayes classification; k-NN; Support vector machines; Random forest; K-Means; Whole slide images; TCGA; Histopathology; Deep learning; Digital image analysis; Convolutional neural network; H&E-stained images; Nuclei segmentation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Breast Carcinoma is a common cancer among women, with invasive ductal carcinoma and lobular carcinoma being the two most frequent types. Early detection is critical to prevent cancer from becoming malignant. Diagnostic tests include mammogram, ultrasound, MRI, or biopsy. Machine Learning algorithms can play a key role in analyzing complex clinical datasets to predict disease outcomes. This study uses machine learning and deep learning techniques to analyze publicly available clinical and medical image data. For clinical data, Principal Component Analysis (PCA) and Particle Swarm Optimization (PSO) are applied on the Wisconsin Breast Cancer dataset (WDBC) for feature selection and evaluate the performance of each modality in distinguishing between benign and malignant tumors. The results obtained show that the Random Forest (RF) classifier outperforms other classification algorithms using both PSO and PCA feature selections, achieving predictive accuracies of 95.7% and 97.2% respectively. The first part of the paper contains a comprehensive analysis of the two feature selection methods on clinical data to optimize predictive performance. The second part of the paper is concerned with image data. Although Histopathological Whole Slide Imaging (WSI) has been validated for a variety of pathological applications for over two decades of manual detection of cancerous tumors, it remains challenging and prone to human error. With the potential of deep learning models to aid pathologists in detecting cancer subtypes, and the increasing predictive ability of current image analysis techniques in identifying the underlying genomic data and cancer-causing mutations, the second half of the paper focusses on feature extraction using a deep convolutional neural network (U-Net) trained on WSI's from The Cancer Genome Atlas (TCGA) to accurately classify and extract relevant features. The focus is on feature extraction, nuclei-based instance segmentation, H&E-stained image extraction, and quantifying intensity information for a given WSI to classify the disease type. A comprehensive analysis of feature selection methods is presented for both clinical and medical image data.
引用
收藏
页码:1494 / 1525
页数:32
相关论文
共 50 条
  • [21] A Feature Learning Framework for Reproducible Invasive Tumor Detection of Breast Cancer in Whole-Slide Images
    Cruz-Roa, Angel
    Basavanhally, Ajay
    Gonzalez, Fabio
    Feldman, Michael
    Ganesan, Shridar
    Shih, Natalie
    Tomaszewski, John
    Gilmore, Hannah
    Madabhushi, Anant
    MODERN PATHOLOGY, 2015, 28 : 40A - 40A
  • [22] Multiclass Classification of Breast Cancer in Whole-Slide Images
    Kwok, Scotty
    IMAGE ANALYSIS AND RECOGNITION (ICIAR 2018), 2018, 10882 : 931 - 940
  • [23] Classification of Histopathological Images from Breast Cancer Patients Using Deep Learning: A Comparative Analysis
    Thalakottor L.A.
    Shirwaikar R.D.
    Pothamsetti P.T.
    Mathews L.M.
    Critical Reviews in Biomedical Engineering, 2023, 51 (04) : 41 - 62
  • [24] Regenerative Random Forest with Automatic Feature Selection to Detect Mitosis in Histopathological Breast Cancer Images
    Paul, Angshuman
    Dey, Anisha
    Mukherjee, Dipti Prasad
    Sivaswamy, Jayanthi
    Tourani, Vijaya
    MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2015, PT II, 2015, 9350 : 94 - 102
  • [25] Breast Cancer Prediction using Feature Selection and Ensemble Voting
    Nguyen, Quang H.
    Do, Trang T. T.
    Wang, Yijing
    Heng, Sin Swee
    Chen, Kelly
    Ang, Wei Hao Max
    Philip, Conceicao Edwin
    Singh, Misha
    Pham, Hung N.
    Nguyen, Binh P.
    Chua, Matthew C. H.
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE), 2019, : 250 - 254
  • [26] Evaluation of Mitotic Activity Index in Breast Cancer Using Whole Slide Digital Images
    Al-Janabi, Shaimaa
    van Slooten, Henk-Jan
    Visser, Mike
    van der Ploeg, Tjeerd
    van Diest, Paul J.
    Jiwa, Mehdi
    PLOS ONE, 2013, 8 (12):
  • [27] Contrastive learning-based histopathological features infer molecular subtypes and clinical outcomes of breast cancer from unannotated whole slide images
    Liu, Hui
    Zhang, Yang
    Luo, Judong
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 170
  • [28] Hybrid Aggregation Network for Survival Analysis from Whole Slide Histopathological Images
    Chang, Jia-Ren
    Lee, Ching-Yi
    Chen, Chi-Chung
    Reischl, Joachim
    Qaiser, Talha
    Yeh, Chao-Yuan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT V, 2021, 12905 : 731 - 740
  • [29] Importance of feature selection and data visualization towards prediction of breast cancer
    Krishnamurthi R.
    Aggrawal N.
    Sharma L.
    Srivastava D.
    Sharma S.
    Recent Patents on Computer Science, 2019, 12 (04) : 317 - 328
  • [30] Deep learning for survival analysis in breast cancer with whole slide image data
    Liu, Huidong
    Kurc, Tahsin
    BIOINFORMATICS, 2022, 38 (14) : 3629 - 3637