Feature Selection in Imbalanced Data

被引:0
|
作者
Kamalov F. [1 ]
Thabtah F. [2 ]
Leung H.H. [3 ]
机构
[1] Canadian University of Dubai, Dubai
[2] Manukau Institute of Technology, Manukau
[3] UAE University, Al Ain
关键词
Big data; Data mining; F[!sub]1[!/sub]-score; Feature selection; Filter method; Imbalanced data; Machine learning;
D O I
10.1007/s40745-021-00366-5
中图分类号
学科分类号
摘要
The traditional feature selection methods are not suitable for imbalanced data as they tend to be biased towards the majority class. This problem is particularly acute in the field of medical diagnostics and fraud detection where the class distribution is highly skewed. In this paper, we propose a novel filter approach using decision tree-based F1-score. The F1-score incorporates the accuracy with respect to the minority class data and hence is a good measure in the case of imbalanced data. In the proposed implementation, the F1-score is calculated based on a 1-dimensional decision tree classifier resulting in a fast and effective feature evaluation method. Numerical experiments confirm that the proposed method achieves robust dimensionality reduction and accuracy results. In addition, the low computational complexity of the algorithm makes it a practical choice for big data applications. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
引用
收藏
页码:1527 / 1541
页数:14
相关论文
共 50 条
  • [21] An effective distance based feature selection approach for imbalanced data
    Shahee, Shaukat Ali
    Ananthakumar, Usha
    [J]. APPLIED INTELLIGENCE, 2020, 50 (03) : 717 - 745
  • [22] An effective distance based feature selection approach for imbalanced data
    Shaukat Ali Shahee
    Usha Ananthakumar
    [J]. Applied Intelligence, 2020, 50 : 717 - 745
  • [23] Feature selection via minimizing global redundancy for imbalanced data
    Shuhao Huang
    Hongmei Chen
    Tianrui Li
    Hao Chen
    Chuan Luo
    [J]. Applied Intelligence, 2022, 52 : 8685 - 8707
  • [24] Weighted Gini Index Feature Selection Method for Imbalanced Data
    Liu, Haoyue
    Zhou, MengChu
    Lu, Xiaoyu Sean
    Yao, Cynthia
    [J]. 2018 IEEE 15TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2018,
  • [25] Ensemble feature selection approach for imbalanced textual data using MapReduce
    Amazal H.
    Ramdani M.
    Kissi M.
    [J]. International Journal of Business Intelligence and Data Mining, 2021, 19 (04) : 395 - 417
  • [26] Predicting additive manufacturing defects with robust feature selection for imbalanced data
    Houser, Ethan
    Shashaani, Sara
    Harrysson, Ola
    Jeon, Yongseok
    [J]. IISE TRANSACTIONS, 2024, 56 (09) : 1001 - 1019
  • [27] A novel oversampling and feature selection hybrid algorithm for imbalanced data classification
    Fang Feng
    Kuan-Ching Li
    Erfu Yang
    Qingguo Zhou
    Lihong Han
    Amir Hussain
    Mingjiang Cai
    [J]. Multimedia Tools and Applications, 2023, 82 : 3231 - 3267
  • [28] A novel oversampling and feature selection hybrid algorithm for imbalanced data classification
    Feng, Fang
    Li, Kuan-Ching
    Yang, Erfu
    Zhou, Qingguo
    Han, Lihong
    Hussain, Amir
    Cai, Mingjiang
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (03) : 3231 - 3267
  • [29] Ant-Based Feature and Instance Selection for Multiclass Imbalanced Data
    Villuendas-Rey, Yenny
    Yanez-Marquez, Cornelio
    Camacho-Nieto, Oscar
    [J]. IEEE Access, 2024, 12 : 133952 - 133968
  • [30] An Empirical Study on the Stability of Feature Selection for Imbalanced Software Engineering Data
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 317 - 323