Unbalanced breast cancer data classification using novel fitness functions in genetic programming

被引:70
|
作者
Devarriya, Divyaansh [1 ]
Gulati, Cairo [1 ]
Mansharamani, Vidhi [1 ]
Sakalle, Aditi [2 ]
Bhardwaj, Arpit [1 ]
机构
[1] Bennett Univ, Comp Sci Engn Dept, Greater Noida, India
[2] Acropolis Tech Campus, Elect Engn Dept, Indore, Madhya Pradesh, India
关键词
Breast cancer; Unbalanced data; Genetic programming; Fitness function; DIAGNOSIS; EVOLUTION; SELECTION; NETWORK; SYSTEM;
D O I
10.1016/j.eswa.2019.112866
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Breast Cancer is a common disease and to prevent it, the disease must be identified at earlier stages. Available breast cancer datasets are unbalanced in nature, i.e. there are more instances of benign (non-cancerous) cases then malignant (cancerous) ones. Therefore, it is a challenging task for most machine learning (ML) models to classify between benign and malignant cases properly, even though they have high accuracy. Accuracy is not a good metric to assess the results of ML models on breast cancer dataset because of biased results. To address this issue, we use Genetic Programming (GP) and propose two fitness functions. First one is F2 score which focuses on learning more about the minority class, which contains more relevant information, the second one is a novel fitness function known as Distance score (D score) which learns about both the classes by giving them equal importance and being unbiased. The GP framework in which we implemented D score is named as D-score GP (DGP) and the framework implemented with F2 score is named as F2GP. The proposed F2GP achieved a maximum accuracy of 99.63%, 99.51% and 100% for 60-40, 70-30 partition schemes and 10 fold cross validation scheme respectively and DGP achieves a maximum accuracy of 99.63%, 98.5% and 100% in 60-40, 70-30 partition schemes and 10 fold cross validation scheme respectively. The proposed models also achieves a recall of 100% for all the test cases. This shows that using a new fitness function for unbalanced data classification improves the performance of a classifier. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] A Study of Fitness Functions for Data Classification Using Grammatical Evolution
    Chareka, Tatenda
    Pillay, Nelishia
    2016 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS INTERNATIONAL CONFERENCE (PRASA-ROBMECH), 2016,
  • [32] Genetic Programming with Interval Functions and Ensemble Learning for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Xue, Bing
    Andreae, Peter
    AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 577 - 589
  • [33] Ensemble classifiers using multi-objective Genetic Programming for unbalanced data
    Meng, Wenyang
    Li, Ying
    Gao, Xiaoying
    Ma, Jianbin
    APPLIED SOFT COMPUTING, 2024, 158
  • [34] Partial functions in fitness-shared genetic programming
    McKay, RI
    PROCEEDINGS OF THE 2000 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1 AND 2, 2000, : 349 - 356
  • [35] Selection of Fitness Function in Genetic Programming for Binary Classification
    Aslam, Muhammad Waqar
    2015 SCIENCE AND INFORMATION CONFERENCE (SAI), 2015, : 489 - 493
  • [36] ADAPTED GEOMETRIC SEMANTIC GENETIC PROGRAMMING FOR DIABETES AND BREAST CANCER CLASSIFICATION
    Zhu, Zhechen
    Nandi, Asoke K.
    Aslam, Muhammad Waqar
    2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2013,
  • [37] Learning discriminant functions with fuzzy attributes for classification using genetic programming
    Chien, BC
    Lin, JY
    Hong, TP
    EXPERT SYSTEMS WITH APPLICATIONS, 2002, 23 (01) : 31 - 37
  • [38] Evolving data classification programs using genetic parallel programming
    Cheang, SM
    Lee, KH
    Leung, KS
    CEC: 2003 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-4, PROCEEDINGS, 2003, : 248 - 255
  • [39] Scaling Genetic Programming for Data Classification using MapReduce Methodology
    Al-Madi, Nailah
    Ludwig, Simone A.
    2013 WORLD CONGRESS ON NATURE AND BIOLOGICALLY INSPIRED COMPUTING (NABIC), 2013, : 132 - 139
  • [40] Classification of breast masses in mammograms using genetic programming and feature selection
    R. J. Nandi
    A. K. Nandi
    R. M. Rangayyan
    D. Scutt
    Medical and Biological Engineering and Computing, 2006, 44 : 683 - 694