A New Breast Cancer Discovery Strategy: A Combined Outlier Rejection Technique and an Ensemble Classification Method

被引:0
|
作者
Ali, Shereen H. [1 ]
Shehata, Mohamed [2 ]
机构
[1] Delta Higher Inst Engn & Technol, Commun & Elect Engn Dept, Mansoura 35511, Egypt
[2] Univ Louisville, Speed Sch Engn, Dept Bioengn, Louisville, KY 40292 USA
来源
BIOENGINEERING-BASEL | 2024年 / 11卷 / 11期
关键词
breast cancer; data mining; feature selection; outlier rejection; Harris hawk optimization; ensemble classification;
D O I
10.3390/bioengineering11111148
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Annually, many people worldwide lose their lives due to breast cancer, making it one of the most prevalent cancers in the world. Since the disease is becoming more common, early detection of breast cancer is essential to avoiding serious complications and possibly death as well. This research provides a novel Breast Cancer Discovery (BCD) strategy to aid patients by providing prompt and sensitive detection of breast cancer. The two primary steps that form the BCD are the Breast Cancer Discovery Step (BCDS) and the Pre-processing Step (P2S). In the P2S, the needed data are filtered from any non-informative data using three primary operations: data normalization, feature selection, and outlier rejection. Only then does the diagnostic model in the BCDS for precise diagnosis begin to be trained. The primary contribution of this research is the novel outlier rejection technique known as the Combined Outlier Rejection Technique (CORT). CORT is divided into two primary phases: (i) the Quick Rejection Phase (QRP), which is a quick phase utilizing a statistical method, and (ii) the Accurate Rejection Phase (ARP), which is a precise phase using an optimization method. Outliers are rapidly eliminated during the QRP using the standard deviation, and the remaining outliers are thoroughly eliminated during ARP via Binary Harris Hawk Optimization (BHHO). The P2S in the BCD strategy indicates that data normalization is a pre-processing approach used to find numeric values in the datasets that fall into a predetermined range. Information Gain (IG) is then used to choose the optimal subset of features, and CORT is used to reject incorrect training data. Furthermore, based on the filtered data from the P2S, an Ensemble Classification Method (ECM) is utilized in the BCDS to identify breast cancer patients. This method consists of three classifiers: Na & iuml;ve Bayes (NB), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). The Wisconsin Breast Cancer Database (WBCD) dataset, which contains digital images of fine-needle aspiration samples collected from patients' breast masses, is used herein to compare the BCD strategy against several contemporary strategies. According to the outcomes of the experiment, the suggested method is very competitive. It achieves 0.987 accuracy, 0.013 error, 0.98 recall, 0.984 precision, and a run time of 3 s, outperforming all other methods from the literature.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] On a new outlier rejection technique
    Thibeault, C.
    25TH IEEE VLSI TEST SYMPOSIUM, PROCEEDINGS, 2007, : 97 - 103
  • [2] Improving Multiclass Classification and Outlier Detection Method through Ensemble Technique
    Ndirangu, Dalton
    Mwangi, Waweru
    Nderu, Lawrence
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION PROCESSING (ICCIP 2018), 2018, : 180 - 185
  • [3] A new ensemble method for outlier identification
    Alexandropoulos, Stamatios-Aggelos N.
    Kotsiantis, Sotiris B.
    Piperigou, Violetta E.
    Vrahatis, Michael N.
    PROCEEDINGS OF THE CONFLUENCE 2020: 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING, 2020, : 769 - 774
  • [4] A mixture model and em algorithm for robust classification, outlier rejection, and class discovery
    Miller, DJ
    Browning, J
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 809 - 812
  • [5] Ensemble Classification Method Based on Truth Discovery
    Jin, Yuxin
    Yang, Ze
    He, Ying
    Bao, Xianyu
    Wu, Gongqing
    2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 112 - 118
  • [6] A new nested ensemble technique for automated diagnosis of breast cancer
    Abdar, Moloud
    Zomorodi-Moghadam, Mariam
    Zhou, Xujuan
    Gururajan, Raj
    Tao, Xiaohui
    Barua, Prabal D.
    Gururajan, Rashmi
    PATTERN RECOGNITION LETTERS, 2020, 132 : 123 - 131
  • [7] An Ensemble Filter Feature Selection Method and Outlier Detection Method for Multiclass Classification
    Ndirangu, Dalton
    Mwangi, Waweru
    Nderu, Lawrence
    2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND COMPUTER APPLICATIONS (ICSCA 2019), 2019, : 373 - 379
  • [8] An Efficient Automated Technique for Classification of Breast Cancer Using Deep Ensemble Model
    Rehman M.Z.U.
    Ahmad J.
    Jaha E.S.
    Ali A.M.
    Alzain M.A.
    Saeed F.
    Computer Systems Science and Engineering, 2023, 46 (01): : 897 - 911
  • [9] A New Feature Ensemble with a Multistage Classification Scheme for Breast Cancer Diagnosis
    Esener, Idil Isikli
    Ergin, Semih
    Yuksel, Tolga
    JOURNAL OF HEALTHCARE ENGINEERING, 2017, 2017
  • [10] A Combined Shotgun and Targeted Mass Spectrometry Strategy for Breast Cancer Biomarker Discovery
    Sjostrom, Martin
    Ossola, Reto
    Breslin, Thomas
    Rinner, Oliver
    Malmstroem, Lars
    Schmidt, Alexander
    Aebersold, Ruedi
    Malmstrom, Johan
    Nimeus, Emma
    JOURNAL OF PROTEOME RESEARCH, 2015, 14 (07) : 2807 - 2818