Feature Selection in Imbalanced Data

被引:0
|
作者
Kamalov F. [1 ]
Thabtah F. [2 ]
Leung H.H. [3 ]
机构
[1] Canadian University of Dubai, Dubai
[2] Manukau Institute of Technology, Manukau
[3] UAE University, Al Ain
关键词
Big data; Data mining; F[!sub]1[!/sub]-score; Feature selection; Filter method; Imbalanced data; Machine learning;
D O I
10.1007/s40745-021-00366-5
中图分类号
学科分类号
摘要
The traditional feature selection methods are not suitable for imbalanced data as they tend to be biased towards the majority class. This problem is particularly acute in the field of medical diagnostics and fraud detection where the class distribution is highly skewed. In this paper, we propose a novel filter approach using decision tree-based F1-score. The F1-score incorporates the accuracy with respect to the minority class data and hence is a good measure in the case of imbalanced data. In the proposed implementation, the F1-score is calculated based on a 1-dimensional decision tree classifier resulting in a fast and effective feature evaluation method. Numerical experiments confirm that the proposed method achieves robust dimensionality reduction and accuracy results. In addition, the low computational complexity of the algorithm makes it a practical choice for big data applications. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
引用
收藏
页码:1527 / 1541
页数:14
相关论文
共 50 条
  • [1] Univariate feature selection on imbalanced data
    Chatterjee, Avishek
    Woodruff, Henry
    Lobbes, Marc
    Vallieres, Martin
    Seuntjens, Jan
    [J]. MEDICAL PHYSICS, 2019, 46 (11) : 5375 - 5375
  • [2] Evolutionary feature selection for imbalanced data
    Tusell Rey, Claudia C.
    Salinas Garcia, Viridiana
    Villuendas-Rey, Yenny
    [J]. 2023 MEXICAN INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE, ENC, 2024,
  • [3] Causal Feature Selection With Imbalanced Data
    Ling, Zhaolong
    Wu, Jingxuan
    Zhang, Yiwen
    Zhou, Peng
    Yu, Kui
    Jiang, Bingbing
    Wu, Xindong
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [4] An Embedded Feature Selection Method for Imbalanced Data Classification
    Liu, Haoyue
    Zhou, MengChu
    Liu, Qing
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2019, 6 (03) : 703 - 715
  • [5] Feature Selection with Imbalanced Data for Software Defect Prediction
    Khoshgoftaar, Taghi M.
    Gao, Kehan
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 235 - +
  • [6] Feature selection for high-dimensional imbalanced data
    Yin, Liuzhi
    Ge, Yong
    Xiao, Keli
    Wang, Xuehua
    Quan, Xiaojun
    [J]. NEUROCOMPUTING, 2013, 105 : 3 - 11
  • [7] An Embedded Feature Selection Method for Imbalanced Data Classification
    Haoyue Liu
    MengChu Zhou
    Qing Liu
    [J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6 (03) : 703 - 715
  • [8] A Classification Method Based on Feature Selection for Imbalanced Data
    Liu, Yi
    Wang, Yanzhen
    Ren, Xiaoguang
    Zhou, Hao
    Diao, Xingchun
    [J]. IEEE ACCESS, 2019, 7 : 81794 - 81807
  • [9] Imbalanced Data Classification Based on Feature Selection Techniques
    Ksieniewicz, Pawel
    Wozniak, Michal
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2018), PT II, 2018, 11315 : 296 - 303
  • [10] Feature Selection with High-Dimensional Imbalanced Data
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    Wald, Randall
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514