Unbalanced breast cancer data classification using novel fitness functions in genetic programming

被引:70
|
作者
Devarriya, Divyaansh [1 ]
Gulati, Cairo [1 ]
Mansharamani, Vidhi [1 ]
Sakalle, Aditi [2 ]
Bhardwaj, Arpit [1 ]
机构
[1] Bennett Univ, Comp Sci Engn Dept, Greater Noida, India
[2] Acropolis Tech Campus, Elect Engn Dept, Indore, Madhya Pradesh, India
关键词
Breast cancer; Unbalanced data; Genetic programming; Fitness function; DIAGNOSIS; EVOLUTION; SELECTION; NETWORK; SYSTEM;
D O I
10.1016/j.eswa.2019.112866
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Breast Cancer is a common disease and to prevent it, the disease must be identified at earlier stages. Available breast cancer datasets are unbalanced in nature, i.e. there are more instances of benign (non-cancerous) cases then malignant (cancerous) ones. Therefore, it is a challenging task for most machine learning (ML) models to classify between benign and malignant cases properly, even though they have high accuracy. Accuracy is not a good metric to assess the results of ML models on breast cancer dataset because of biased results. To address this issue, we use Genetic Programming (GP) and propose two fitness functions. First one is F2 score which focuses on learning more about the minority class, which contains more relevant information, the second one is a novel fitness function known as Distance score (D score) which learns about both the classes by giving them equal importance and being unbiased. The GP framework in which we implemented D score is named as D-score GP (DGP) and the framework implemented with F2 score is named as F2GP. The proposed F2GP achieved a maximum accuracy of 99.63%, 99.51% and 100% for 60-40, 70-30 partition schemes and 10 fold cross validation scheme respectively and DGP achieves a maximum accuracy of 99.63%, 98.5% and 100% in 60-40, 70-30 partition schemes and 10 fold cross validation scheme respectively. The proposed models also achieves a recall of 100% for all the test cases. This shows that using a new fitness function for unbalanced data classification improves the performance of a classifier. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Acquisition of Glycan Motifs using Genetic Programming and Various Fitness Functions
    Miyahara, Tetsuhiro
    Kuboyama, Tetsuji
    6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS, AND THE 13TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS, 2012, : 1684 - 1689
  • [22] Simplifying Fitness Landscapes Using Dilation Functions Evolved With Genetic Programming
    Papetti, Daniele M.
    Tangherloni, Andrea
    Farinati, Davide
    Cazzaniga, Paolo
    Vanneschi, Leonardo
    IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2023, 18 (01) : 22 - 31
  • [23] Speaker Verification on Unbalanced Data with Genetic Programming
    Loughran, Roisin
    Agapitos, Alexandros
    Kattan, Ahmed
    Brabazon, Anthony
    O'Neill, Michael
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2016, PT I, 2016, 9597 : 737 - 753
  • [24] Feature Selected Cancer Data Classification with Genetic Programming
    Arslan, Sibel
    Ozturk, Celal
    2017 21ST NATIONAL BIOMEDICAL ENGINEERING MEETING (BIYOMUT), 2017,
  • [25] Ensemble Learning and Pruning in Multi-Objective Genetic Programming for Classification with Unbalanced Data
    Bhowan, Urvesh
    Johnston, Mark
    Zhang, Mengjie
    AI 2011: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 7106 : 192 - 202
  • [26] Prognosis of Breast Cancer Using Genetic Programming
    Ludwig, Simone A.
    Roos, Stefanie
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT IV, 2010, 6279 : 536 - 545
  • [27] Breast cancer detection using Genetic Programming
    Guo, Hong
    Zhang, Qing
    Nandi, Asoke K.
    BIOSIGNALS 2008: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING, VOL II, 2008, : 334 - 341
  • [28] Breast Cancer Diagnosis using Simultaneous Feature Selection and Classification: A Genetic Programming Approach
    Bhardwaj, Harshit
    Sakalle, Aditi
    Tiwari, Aruna
    Verma, Madhushi
    Bhardwaj, Arpit
    2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 2186 - 2192
  • [29] High-Dimensional Unbalanced Binary Classification by Genetic Programming with Multi-Criterion Fitness Evaluation and Selection
    Pei, Wenbin
    Xue, Bing
    Shang, Lin
    Zhang, Mengjie
    EVOLUTIONARY COMPUTATION, 2022, 30 (01) : 99 - 129
  • [30] Directly Constructing Multiple Features for Classification with Missing Data using Genetic Programming with Interval Functions
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'16 COMPANION), 2016, : 69 - 70