Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis

被引:20
|
作者
Ibrahim, Sara [1 ]
Nazir, Saima [2 ]
Velastin, Sergio A. [3 ,4 ]
机构
[1] Capital Univ Sci & Technol, Dept Comp Sci, Islamabad 45730, Pakistan
[2] Natl Univ Modern Languages, Dept Software Engn, Rawalpindi 46000, Pakistan
[3] Univ Carlos III Madrid, Dept Comp Sci & Engn, Appl Artificial Intelligence Res Grp, Madrid 28270, Spain
[4] Queen Mary Univ London, Sch Elect Engn & Comp Sci, London E1 4NS, England
关键词
breast cancer diagnosis; Wisconsin Breast Cancer Dataset; feature selection; dimensionality reduction; principal component analysis; ensemble method;
D O I
10.3390/jimaging7110225
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Breast cancer is one of the leading causes of death among women, more so than all other cancers. The accurate diagnosis of breast cancer is very difficult due to the complexity of the disease, changing treatment procedures and different patient population samples. Diagnostic techniques with better performance are very important for personalized care and treatment and to reduce and control the recurrence of cancer. The main objective of this research was to select feature selection techniques using correlation analysis and variance of input features before passing these significant features to a classification method. We used an ensemble method to improve the classification of breast cancer. The proposed approach was evaluated using the public WBCD dataset (Wisconsin Breast Cancer Dataset). Correlation analysis and principal component analysis were used for dimensionality reduction. Performance was evaluated for well-known machine learning classifiers, and the best seven classifiers were chosen for the next step. Hyper-parameter tuning was performed to improve the performances of the classifiers. The best performing classification algorithms were combined with two different voting techniques. Hard voting predicts the class that gets the majority vote, whereas soft voting predicts the class based on highest probability. The proposed approach performed better than state-of-the-art work, achieving an accuracy of 98.24%, high precision (99.29%) and a recall value of 95.89%.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Feature Selection for Fault Diagnosis Using Principal Component Analysis
    Shashoa, Nasar Aldian A.
    Jomah, Omer S. M.
    Abusaeeda, Omar
    Elmezughi, Abdurrezag S.
    2023 58TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATION, COMMUNICATION AND ENERGY SYSTEMS AND TECHNOLOGIES, ICEST, 2023, : 39 - 42
  • [2] Supervised feature selection using principal component analysis
    Rahmat, Fariq
    Zulkafli, Zed
    Ishak, Asnor Juraiza
    Rahman, Ribhan Zafira Abdul
    De Stercke, Simon
    Buytaert, Wouter
    Tahir, Wardah
    Ab Rahman, Jamalludin
    Ibrahim, Salwa
    Ismail, Muhamad
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 1955 - 1995
  • [3] Supervised feature selection using principal component analysis
    Fariq Rahmat
    Zed Zulkafli
    Asnor Juraiza Ishak
    Ribhan Zafira Abdul Rahman
    Simon De Stercke
    Wouter Buytaert
    Wardah Tahir
    Jamalludin Ab Rahman
    Salwa Ibrahim
    Muhamad Ismail
    Knowledge and Information Systems, 2024, 66 : 1955 - 1995
  • [4] Gearbox incipient fault diagnosis using feature sample selection and principal component analysis
    Li, Weihua
    Xu, Yabing
    INTERNATIONAL JOURNAL OF MODELLING IDENTIFICATION AND CONTROL, 2010, 10 (3-4) : 246 - 254
  • [5] A Feature Selection Analysis in Breast Cancer Diagnosis
    Isikli Esener, Idil
    Ergin, Semih
    Yuksel, Tolga
    2015 MEDICAL TECHNOLOGIES NATIONAL CONFERENCE (TIPTEKNO), 2015,
  • [6] Clustering and feature selection using sparse principal component analysis
    Ronny Luss
    Alexandre d’Aspremont
    Optimization and Engineering, 2010, 11 : 145 - 157
  • [7] Feature selection using principal component analysis and genetic algorithm
    Adhao, Rahul
    Pachghare, Vinod
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2020, 23 (02): : 595 - 602
  • [8] Clustering and feature selection using sparse principal component analysis
    Luss, Ronny
    d'Aspremont, Alexandre
    OPTIMIZATION AND ENGINEERING, 2010, 11 (01) : 145 - 157
  • [9] Feature Selection for Classification using Principal Component Analysis and Information Gain
    Omuya, Erick Odhiambo
    Okeyo, George Onyango
    Kimwele, Michael Waema
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 174
  • [10] Feature selection using Principal Component Analysis for massive retweet detection
    Morchid, Mohamed
    Dufour, Richard
    Bousquet, Pierre-Michel
    Linares, Georges
    Torres-Moreno, Juan-Manuel
    PATTERN RECOGNITION LETTERS, 2014, 49 : 33 - 39