A New Breast Cancer Discovery Strategy: A Combined Outlier Rejection Technique and an Ensemble Classification Method

被引:0
|
作者
Ali, Shereen H. [1 ]
Shehata, Mohamed [2 ]
机构
[1] Delta Higher Inst Engn & Technol, Commun & Elect Engn Dept, Mansoura 35511, Egypt
[2] Univ Louisville, Speed Sch Engn, Dept Bioengn, Louisville, KY 40292 USA
来源
BIOENGINEERING-BASEL | 2024年 / 11卷 / 11期
关键词
breast cancer; data mining; feature selection; outlier rejection; Harris hawk optimization; ensemble classification;
D O I
10.3390/bioengineering11111148
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Annually, many people worldwide lose their lives due to breast cancer, making it one of the most prevalent cancers in the world. Since the disease is becoming more common, early detection of breast cancer is essential to avoiding serious complications and possibly death as well. This research provides a novel Breast Cancer Discovery (BCD) strategy to aid patients by providing prompt and sensitive detection of breast cancer. The two primary steps that form the BCD are the Breast Cancer Discovery Step (BCDS) and the Pre-processing Step (P2S). In the P2S, the needed data are filtered from any non-informative data using three primary operations: data normalization, feature selection, and outlier rejection. Only then does the diagnostic model in the BCDS for precise diagnosis begin to be trained. The primary contribution of this research is the novel outlier rejection technique known as the Combined Outlier Rejection Technique (CORT). CORT is divided into two primary phases: (i) the Quick Rejection Phase (QRP), which is a quick phase utilizing a statistical method, and (ii) the Accurate Rejection Phase (ARP), which is a precise phase using an optimization method. Outliers are rapidly eliminated during the QRP using the standard deviation, and the remaining outliers are thoroughly eliminated during ARP via Binary Harris Hawk Optimization (BHHO). The P2S in the BCD strategy indicates that data normalization is a pre-processing approach used to find numeric values in the datasets that fall into a predetermined range. Information Gain (IG) is then used to choose the optimal subset of features, and CORT is used to reject incorrect training data. Furthermore, based on the filtered data from the P2S, an Ensemble Classification Method (ECM) is utilized in the BCDS to identify breast cancer patients. This method consists of three classifiers: Na & iuml;ve Bayes (NB), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). The Wisconsin Breast Cancer Database (WBCD) dataset, which contains digital images of fine-needle aspiration samples collected from patients' breast masses, is used herein to compare the BCD strategy against several contemporary strategies. According to the outcomes of the experiment, the suggested method is very competitive. It achieves 0.987 accuracy, 0.013 error, 0.98 recall, 0.984 precision, and a run time of 3 s, outperforming all other methods from the literature.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] A Steiner tree-based method for biomarker discovery and classification in breast cancer metastasis
    Jahid, Md Jamiul
    Ruan, Jianhua
    BMC GENOMICS, 2012, 13
  • [32] Ensemble outlier detection and gene selection in triple-negative breast cancer data
    Marta B. Lopes
    André Veríssimo
    Eunice Carrasquinha
    Sandra Casimiro
    Niko Beerenwinkel
    Susana Vinga
    BMC Bioinformatics, 19
  • [33] A Selective Ensemble Classification Method Combining Mammography Images with Ultrasound Images for Breast Cancer Diagnosis
    Cong, Jinyu
    Wei, Benzheng
    He, Yunlong
    Yin, Yilong
    Zheng, Yuanjie
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2017, 2017
  • [34] Breast Tumor Classification Using an Ensemble Machine Learning Method
    Assiri, Adel S.
    Nazir, Saima
    Velastin, Sergio A.
    JOURNAL OF IMAGING, 2020, 6 (06)
  • [35] A New Gene Selection Method Based on Random Subspace Ensemble for Microarray Cancer Classification
    Armano, Giuliano
    Chira, Camelia
    Hatami, Nima
    PATTERN RECOGNITION IN BIOINFORMATICS, 2011, 7036 : 191 - +
  • [36] A New Ensemble of Features for Breast Cancer Diagnosis
    Esener, I. Isikli
    Ergin, S.
    Yuksel, T.
    2015 8TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2015, : 1168 - 1173
  • [37] A New Self-Training-Based Unsupervised Satellite Image Classification Technique Using Cluster Ensemble Strategy
    Banerjee, Biplab
    Bovolo, Francesca
    Bhattacharya, Avik
    Bruzzone, Lorenzo
    Chaudhuri, Subhasis
    Mohan, B. Krishna
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2015, 12 (04) : 741 - 745
  • [38] Ensemble Case based Reasoning Imputation in Breast Cancer Classification
    Chlioui, Imane
    Idri, Ali
    Abnane, Ibtissam
    Ezzat, Mahmoud
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2021, 37 (05) : 1039 - 1051
  • [39] An Ensemble Deep Learning Model for the Detection and Classification of Breast Cancer
    Sami, Joy Christy Antony
    Arumugam, Umamakeswari
    MIDDLE EAST JOURNAL OF CANCER, 2024, 15 (01) : 40 - 51
  • [40] A review of homogenous ensemble methods on the classification of breast cancer data
    Idris, Nur Farahaina
    Ismail, Mohd Arfian
    PRZEGLAD ELEKTROTECHNICZNY, 2024, 100 (01): : 101 - 104