Variance Ranking Attributes Selection Techniques for Binary Classification Problem in Imbalance Data

被引:76
|
作者
Ebenuwa, Solomon H. [1 ]
Sharif, Mhd Saeed [1 ]
Alazab, Mamoun [2 ]
Al-Nemrat, Ameer [1 ]
机构
[1] Univ East London, Sch Architecture Comp & Engn, London E16 2RD, England
[2] Charles Darwin Univ, Coll Engn IT & Environm, Casuarina, NT 0810, Australia
关键词
Imbalanced dataset; class distribution; binary class; imbalance ratio; majority class; minority class; oversampling; under sampling; logistic regression; support vector machine; decision tree; ranked order similarity; peak threshold accuracy; PREDICTION; DISCRETE;
D O I
10.1109/ACCESS.2019.2899578
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data are being generated and used to support all aspects of healthcare provision, from policy formation to the delivery of primary care services. Particularly, with the change of emphasis from curative to preventive medicine, the importance of data-based research such as data mining and machine learning has emphasized the issues of class distributions in datasets. In typical predictive modeling, the inability to effectively address a class imbalance in a real-life dataset is an important shortcoming of the existing machine learning algorithms. Most algorithms assume a balanced class in their design, resulting in poor performance in predicting the minority target class. Ironically, the minority target class is usually the focus in predicting processes. The misclassification of the minority target class has resulted in serious consequences in detecting chronic diseases and detecting fraud and intrusion where positive cases are erroneously predicted as not positive. This paper presents a new attribute selection technique called variance ranking for handling imbalance class problems in a dataset. The results obtained were compared to two well-known attribute selection techniques: the Pearson correlation and information gain technique. This paper uses a novel similarity measurement technique ranked order similarity-ROS to evaluate the variance ranking attribute selection compared to the Pearson correlations and information gain. Further validation was carried out using three binary classifications: logistic regression, support vector machine, and decision tree. The proposed variance ranking and ranked order similarity techniques showed better results than the benchmarks. The ROS technique provided an excellent means of grading and measuring the similarities where other similarity measurement techniques were inadequate or not applicable.
引用
收藏
页码:24649 / 24666
页数:18
相关论文
共 50 条
  • [21] An empirical study on the joint impact of feature selection and data resampling on imbalance classification
    Zhang, Chongsheng
    Soda, Paolo
    Bi, Jingjun
    Fan, Gaojuan
    Almpanidis, George
    Garcia, Salvador
    Ding, Weiping
    APPLIED INTELLIGENCE, 2023, 53 (05) : 5449 - 5461
  • [22] A Ranking Approach for Probe Selection and Classification of Microarray Data with Artificial Neural Networks
    Chagas Faria, Alexandre Wagner
    Da Silva, Alisson Marques
    Rodrigues, Thiago de Souza
    Costa, Marcelo Azevedo
    Braga, Antonio Padua
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2015, 22 (10) : 953 - 961
  • [23] Artificial Neural Networks and Ranking Approach for Probe Selection and Classification of Microarray Data
    Silva, Alisson Marques
    Faria, Alexandre Wagner C.
    Rodrigues, Thiago de Souza
    Costa, Marcelo Azevedo
    Braga, Antonio de Padua
    2013 1ST BRICS COUNTRIES CONGRESS ON COMPUTATIONAL INTELLIGENCE AND 11TH BRAZILIAN CONGRESS ON COMPUTATIONAL INTELLIGENCE (BRICS-CCI & CBIC), 2013, : 598 - 603
  • [24] GMDH-based feature ranking and selection for improved classification of medical data
    Abdel-Aal, RE
    JOURNAL OF BIOMEDICAL INFORMATICS, 2005, 38 (06) : 456 - 468
  • [25] A Combined Clustering and Ranking based Gene Selection Algorithm for Microarray Data Classification
    Rani, M. Jansi
    Devaraj, D.
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2017, : 183 - 187
  • [26] GEV-NN: A deep neural network architecture for class imbalance problem in binary classification
    Munkhdalai, Lkhagvadorj
    Munkhdalai, Tsendsuren
    Ryu, Keun Ho
    KNOWLEDGE-BASED SYSTEMS, 2020, 194
  • [27] Ensemble Meta Classifier with Sampling and Feature Selection for Data with Multiclass Imbalance Problem
    Sainin, Mohd Shamrie
    Alfred, Rayner
    Ahmad, Faudziah
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2021, 20 (02): : 103 - 133
  • [28] Dealing with the Data Imbalance Problem in Pulsar Candidate Sifting Based on Feature Selection
    Lin, Haitao
    Li, Xiangru
    RESEARCH IN ASTRONOMY AND ASTROPHYSICS, 2024, 24 (02)
  • [29] Dealing with the Data Imbalance Problem in Pulsar Candidate Sifting Based on Feature Selection
    Haitao Lin
    Xiangru Li
    Research in Astronomy and Astrophysics, 2024, 24 (02) : 127 - 139
  • [30] A Novel SMOTE-Based Classification Approach to Online Data Imbalance Problem
    Gong, Chunlin
    Gu, Liangxian
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016