Unbalanced breast cancer data classification using novel fitness functions in genetic programming

被引:70
|
作者
Devarriya, Divyaansh [1 ]
Gulati, Cairo [1 ]
Mansharamani, Vidhi [1 ]
Sakalle, Aditi [2 ]
Bhardwaj, Arpit [1 ]
机构
[1] Bennett Univ, Comp Sci Engn Dept, Greater Noida, India
[2] Acropolis Tech Campus, Elect Engn Dept, Indore, Madhya Pradesh, India
关键词
Breast cancer; Unbalanced data; Genetic programming; Fitness function; DIAGNOSIS; EVOLUTION; SELECTION; NETWORK; SYSTEM;
D O I
10.1016/j.eswa.2019.112866
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Breast Cancer is a common disease and to prevent it, the disease must be identified at earlier stages. Available breast cancer datasets are unbalanced in nature, i.e. there are more instances of benign (non-cancerous) cases then malignant (cancerous) ones. Therefore, it is a challenging task for most machine learning (ML) models to classify between benign and malignant cases properly, even though they have high accuracy. Accuracy is not a good metric to assess the results of ML models on breast cancer dataset because of biased results. To address this issue, we use Genetic Programming (GP) and propose two fitness functions. First one is F2 score which focuses on learning more about the minority class, which contains more relevant information, the second one is a novel fitness function known as Distance score (D score) which learns about both the classes by giving them equal importance and being unbiased. The GP framework in which we implemented D score is named as D-score GP (DGP) and the framework implemented with F2 score is named as F2GP. The proposed F2GP achieved a maximum accuracy of 99.63%, 99.51% and 100% for 60-40, 70-30 partition schemes and 10 fold cross validation scheme respectively and DGP achieves a maximum accuracy of 99.63%, 98.5% and 100% in 60-40, 70-30 partition schemes and 10 fold cross validation scheme respectively. The proposed models also achieves a recall of 100% for all the test cases. This shows that using a new fitness function for unbalanced data classification improves the performance of a classifier. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Classification of breast masses in mammograms using genetic programming and feature selection
    Nandi, R. J.
    Nandi, A. K.
    Rangayyan, R. M.
    Scutt, D.
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2006, 44 (08) : 683 - 694
  • [42] Lymphoma cancer classification using genetic programming with SNR features
    Hong, JH
    Cho, SB
    GENETIC PROGRAMMING, PROCEEDINGS, 2004, 3003 : 78 - 88
  • [43] Feature selection and molecular classification of cancer using genetic programming
    Yu, Jianjun
    Yu, Jindan
    Almal, Arpit A.
    Dhanasekaran, Saravana M.
    Ghosh, Debashis
    Worzel, William P.
    Chinnaiyan, Arul M.
    NEOPLASIA, 2007, 9 (04): : 292 - U16
  • [44] On genetic programming representations and fitness functions for interpretable dimensionality reduction
    Uriot, Thomas
    Virgolin, Marco
    Alderliesten, Tanja
    Bosman, Peter A. N.
    PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'22), 2022, : 458 - 466
  • [45] Coevolving functions in genetic programming: classification using K-nearest-neighbour
    Ahluwalla, M
    Bull, L
    GECCO-99: PROCEEDINGS OF THE GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 1999, : 947 - 952
  • [46] Breast cancer diagnosis using genetic programming generated feature
    Guo, H
    Nandi, AK
    2005 IEEE Workshop on Machine Learning for Signal Processing (MLSP), 2005, : 215 - 220
  • [47] Breast cancer diagnosis using genetic programming generated feature
    Guo, H
    Nandi, AK
    PATTERN RECOGNITION, 2006, 39 (05) : 980 - 987
  • [48] Projecting financial data using genetic programming in classification and regression tasks
    Estebanez, Cesar
    Valls, Jose M.
    Aler, Ricardo
    GENETIC PROGRAMMING, PROCEEDINGS, 2006, 3905 : 202 - 212
  • [49] Medical Data Classification Using Genetic Programming: A Systematic Literature Review
    Maurya, Pratibha
    Kushwaha, Arati
    Prakash, Om
    EXPERT SYSTEMS, 2025, 42 (03)
  • [50] Classification of Imbalanced data sets using Multi Objective Genetic Programming
    Maheta, Hardik H.
    Dabhi, Vipul K.
    2015 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2015,