A New Breast Cancer Discovery Strategy: A Combined Outlier Rejection Technique and an Ensemble Classification Method

被引:0
|
作者
Ali, Shereen H. [1 ]
Shehata, Mohamed [2 ]
机构
[1] Delta Higher Inst Engn & Technol, Commun & Elect Engn Dept, Mansoura 35511, Egypt
[2] Univ Louisville, Speed Sch Engn, Dept Bioengn, Louisville, KY 40292 USA
来源
BIOENGINEERING-BASEL | 2024年 / 11卷 / 11期
关键词
breast cancer; data mining; feature selection; outlier rejection; Harris hawk optimization; ensemble classification;
D O I
10.3390/bioengineering11111148
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Annually, many people worldwide lose their lives due to breast cancer, making it one of the most prevalent cancers in the world. Since the disease is becoming more common, early detection of breast cancer is essential to avoiding serious complications and possibly death as well. This research provides a novel Breast Cancer Discovery (BCD) strategy to aid patients by providing prompt and sensitive detection of breast cancer. The two primary steps that form the BCD are the Breast Cancer Discovery Step (BCDS) and the Pre-processing Step (P2S). In the P2S, the needed data are filtered from any non-informative data using three primary operations: data normalization, feature selection, and outlier rejection. Only then does the diagnostic model in the BCDS for precise diagnosis begin to be trained. The primary contribution of this research is the novel outlier rejection technique known as the Combined Outlier Rejection Technique (CORT). CORT is divided into two primary phases: (i) the Quick Rejection Phase (QRP), which is a quick phase utilizing a statistical method, and (ii) the Accurate Rejection Phase (ARP), which is a precise phase using an optimization method. Outliers are rapidly eliminated during the QRP using the standard deviation, and the remaining outliers are thoroughly eliminated during ARP via Binary Harris Hawk Optimization (BHHO). The P2S in the BCD strategy indicates that data normalization is a pre-processing approach used to find numeric values in the datasets that fall into a predetermined range. Information Gain (IG) is then used to choose the optimal subset of features, and CORT is used to reject incorrect training data. Furthermore, based on the filtered data from the P2S, an Ensemble Classification Method (ECM) is utilized in the BCDS to identify breast cancer patients. This method consists of three classifiers: Na & iuml;ve Bayes (NB), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). The Wisconsin Breast Cancer Database (WBCD) dataset, which contains digital images of fine-needle aspiration samples collected from patients' breast masses, is used herein to compare the BCD strategy against several contemporary strategies. According to the outcomes of the experiment, the suggested method is very competitive. It achieves 0.987 accuracy, 0.013 error, 0.98 recall, 0.984 precision, and a run time of 3 s, outperforming all other methods from the literature.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] An ensemble algorithm for breast cancer histopathology image classification
    Kumar, Deepika
    Batra, Usha
    JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2020, 23 (07) : 1187 - 1198
  • [22] An ensemble predictive modeling framework for breast cancer classification
    Nagarajan, Radhakrishnan
    Upreti, Meenakshi
    METHODS, 2017, 131 : 128 - 134
  • [23] A new approach for histological classification of breast cancer using deep hybrid heterogenous ensemble
    Zerouaoui, Hasnae
    Idri, Ali
    El Alaoui, Omar
    DATA TECHNOLOGIES AND APPLICATIONS, 2022, : 1 - 34
  • [24] A new approach for histological classification of breast cancer using deep hybrid heterogenous ensemble
    Zerouaoui, Hasnae
    Idri, Ali
    El Alaoui, Omar
    DATA TECHNOLOGIES AND APPLICATIONS, 2023, 57 (02) : 245 - 278
  • [25] Partitioner Trees for Classification: A New Ensemble Method
    Krempl, Georg
    Hofer, Vera
    APPLICATIONS OF SUPERVISED AND UNSUPERVISED ENSEMBLE METHODS, 2009, 245 : 93 - 112
  • [26] New Ensemble Method for Classification of Data Streams
    Sobhani, Parinaz
    Beigy, Hamid
    2011 1ST INTERNATIONAL ECONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2011, : 264 - 269
  • [27] Ensemble Technique to Predict Breast Cancer on Multiple Datasets
    Chaurasia, Vikas
    Pal, Saurabh
    COMPUTER JOURNAL, 2022, 65 (10): : 2730 - 2740
  • [28] Ensemble outlier detection and gene selection in triple-negative breast cancer data
    Lopes, Marta B.
    Verissimo, Andre
    Carrasquinha, Eunice
    Casimiro, Sandra
    Beerenwinkel, Niko
    Vinga, Susana
    BMC BIOINFORMATICS, 2018, 19
  • [29] Breast Cancer Classification through Meta-Learning Ensemble Technique Using Convolution Neural Networks
    Ali, Muhammad Danish
    Saleem, Adnan
    Elahi, Hubaib
    Khan, Muhammad Amir
    Khan, Muhammad Ijaz
    Yaqoob, Muhammad Mateen
    Khattak, Umar Farooq
    Al-Rasheed, Amal
    DIAGNOSTICS, 2023, 13 (13)
  • [30] A Steiner tree-based method for biomarker discovery and classification in breast cancer metastasis
    Md Jamiul Jahid
    Jianhua Ruan
    BMC Genomics, 13