Variance Ranking Attributes Selection Techniques for Binary Classification Problem in Imbalance Data

被引:76
|
作者
Ebenuwa, Solomon H. [1 ]
Sharif, Mhd Saeed [1 ]
Alazab, Mamoun [2 ]
Al-Nemrat, Ameer [1 ]
机构
[1] Univ East London, Sch Architecture Comp & Engn, London E16 2RD, England
[2] Charles Darwin Univ, Coll Engn IT & Environm, Casuarina, NT 0810, Australia
关键词
Imbalanced dataset; class distribution; binary class; imbalance ratio; majority class; minority class; oversampling; under sampling; logistic regression; support vector machine; decision tree; ranked order similarity; peak threshold accuracy; PREDICTION; DISCRETE;
D O I
10.1109/ACCESS.2019.2899578
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data are being generated and used to support all aspects of healthcare provision, from policy formation to the delivery of primary care services. Particularly, with the change of emphasis from curative to preventive medicine, the importance of data-based research such as data mining and machine learning has emphasized the issues of class distributions in datasets. In typical predictive modeling, the inability to effectively address a class imbalance in a real-life dataset is an important shortcoming of the existing machine learning algorithms. Most algorithms assume a balanced class in their design, resulting in poor performance in predicting the minority target class. Ironically, the minority target class is usually the focus in predicting processes. The misclassification of the minority target class has resulted in serious consequences in detecting chronic diseases and detecting fraud and intrusion where positive cases are erroneously predicted as not positive. This paper presents a new attribute selection technique called variance ranking for handling imbalance class problems in a dataset. The results obtained were compared to two well-known attribute selection techniques: the Pearson correlation and information gain technique. This paper uses a novel similarity measurement technique ranked order similarity-ROS to evaluate the variance ranking attribute selection compared to the Pearson correlations and information gain. Further validation was carried out using three binary classifications: logistic regression, support vector machine, and decision tree. The proposed variance ranking and ranked order similarity techniques showed better results than the benchmarks. The ROS technique provided an excellent means of grading and measuring the similarities where other similarity measurement techniques were inadequate or not applicable.
引用
收藏
页码:24649 / 24666
页数:18
相关论文
共 50 条
  • [41] Bayes Imbalance Impact Index: A Measure of Class Imbalanced Data Set for Classification Problem
    Lu, Yang
    Cheung, Yiu-Ming
    Tang, Yuan Yan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (09) : 3525 - 3539
  • [42] Solving Imbalance Data Classification Problem by Particle Swarm Optimization Support Vector Machine
    Xu, Zhenyuan
    Wu, Mingnan
    Watada, Junzo
    Ibrahim, Zuwarie
    Khalid, Marzuki
    INTELLIGENT DECISION TECHNOLOGIES, 2013, 255 : 371 - 379
  • [43] Comparison of resampling methods for dealing with imbalanced data in binary classification problem
    Park, Geun U.
    Jun, Inkyun G.
    KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (03) : 349 - 374
  • [44] Text Classification Using Ensemble Features Selection and Data Mining Techniques
    Shravankumar, B.
    Ravi, Vadlamani
    SWARM, EVOLUTIONARY, AND MEMETIC COMPUTING, SEMCCO 2014, 2015, 8947 : 176 - 186
  • [45] The imbalance problem: A comparison of sampling approaches using different parameters and feature selection methods in the context of classification
    Morillo-Salas, Jose L.
    Bolon-Canedo, Veronica
    Alonso-Betanzos, Amparo
    EXPERT SYSTEMS, 2024, 41 (08)
  • [46] A Study of Data Classification and Selection Techniques for Medical Decision Support Systems
    Aljaaf, Ahmed J.
    Al-Jumeily, Dhiya
    Hussain, Abir J.
    Lamb, David
    Al-Jumaily, Mohammed
    Abdel-Aziz, Khaled
    INTELLIGENT COMPUTING METHODOLOGIES, 2014, 8589 : 135 - 143
  • [47] Classifier transfer with data selection strategies for online support vector machine classification with class imbalance
    Krell, Mario Michael
    Wilshusen, Nils
    Seeland, Anett
    Kim, Su Kyoung
    JOURNAL OF NEURAL ENGINEERING, 2017, 14 (02)
  • [48] Feature selection based on improved binary global harmony search for data classification
    Gholami, Jafar
    Pourpanah, Farhad
    Wang, Xizhao
    APPLIED SOFT COMPUTING, 2020, 93 (93)
  • [49] An efficient binary chimp optimization algorithm for feature selection in biomedical data classification
    Elnaz Pashaei
    Elham Pashaei
    Neural Computing and Applications, 2022, 34 : 6427 - 6451
  • [50] Correction to: An improved binary sparrow search algorithm for feature selection in data classification
    Ahmed G. Gad
    Karam M. Sallam
    Ripon K. Chakrabortty
    Michael J. Ryan
    Amr A. Abohany
    Neural Computing and Applications, 2022, 34 : 15753 - 15753