Feature Selection in Imbalanced Data

被引:0
|
作者
Kamalov F. [1 ]
Thabtah F. [2 ]
Leung H.H. [3 ]
机构
[1] Canadian University of Dubai, Dubai
[2] Manukau Institute of Technology, Manukau
[3] UAE University, Al Ain
关键词
Big data; Data mining; F[!sub]1[!/sub]-score; Feature selection; Filter method; Imbalanced data; Machine learning;
D O I
10.1007/s40745-021-00366-5
中图分类号
学科分类号
摘要
The traditional feature selection methods are not suitable for imbalanced data as they tend to be biased towards the majority class. This problem is particularly acute in the field of medical diagnostics and fraud detection where the class distribution is highly skewed. In this paper, we propose a novel filter approach using decision tree-based F1-score. The F1-score incorporates the accuracy with respect to the minority class data and hence is a good measure in the case of imbalanced data. In the proposed implementation, the F1-score is calculated based on a 1-dimensional decision tree classifier resulting in a fast and effective feature evaluation method. Numerical experiments confirm that the proposed method achieves robust dimensionality reduction and accuracy results. In addition, the low computational complexity of the algorithm makes it a practical choice for big data applications. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
引用
收藏
页码:1527 / 1541
页数:14
相关论文
共 50 条
  • [31] Exploring An Iterative Feature Selection Technique for Highly Imbalanced Data Sets
    Khoshgoftaar, Taghi M.
    Gao, Kehan
    Napolitano, Amri
    [J]. 2012 IEEE 13TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2012, : 101 - 108
  • [32] Weighted ReliefF with threshold constraints of feature selection for imbalanced data classification
    Song, Yan
    Si, Weiyun
    Dai, Feifan
    Yang, Guisong
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (14):
  • [33] Feature Selection Method Based on Weighted Mutual Information for Imbalanced Data
    Li, Kewen
    Yu, Mingxiao
    Liu, Lu
    Li, Timing
    Zhai, Jiannan
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2018, 28 (08) : 1177 - 1194
  • [34] Online Multi-label Feature Selection on Imbalanced Data Sets
    Liu, Jing
    Guo, Zhongwen
    Sun, Zhongwei
    Liu, Shiyong
    Wang, Xupeng
    [J]. WIRELESS SENSOR NETWORKS (CWSN 2017), 2018, 812 : 165 - 174
  • [35] To use or not to use: Feature selection for sentiment analysis of highly imbalanced data
    Kubler, Sandra
    Liu, Can
    Sayyed, Zeeshan Ali
    [J]. NATURAL LANGUAGE ENGINEERING, 2018, 24 (01) : 3 - 37
  • [36] Iterative ensemble feature selection for multiclass classification of imbalanced microarray data
    Yang, Junshan
    Zhou, Jiarui
    Zhu, Zexuan
    Ma, Xiaoliang
    Ji, Zhen
    [J]. JOURNAL OF BIOLOGICAL RESEARCH-THESSALONIKI, 2016, 23
  • [37] Denying Evolution Resampling: An Improved Method for Feature Selection on Imbalanced Data
    Quan, Li
    Gong, Tao
    Jiang, Kaida
    [J]. ELECTRONICS, 2023, 12 (15)
  • [38] Feature selection method on imbalanced text
    Liao, Yi-Xing
    Pan, Xue-Zeng
    [J]. Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2012, 41 (04): : 592 - 595
  • [39] Integration of feature vector selection and support vector machine for classification of imbalanced data
    Liu, Jie
    Zio, Enrico
    [J]. APPLIED SOFT COMPUTING, 2019, 75 : 702 - 711
  • [40] Multi-Stage Hybrid Feature Selection Algorithm for Imbalanced Medical Data
    Liu, Jiaxuan
    Li, Daiwei
    Ren, Lijuan
    Zhang, Haiqing
    Chen, Jinjing
    Yang, Rui
    [J]. Computer Engineering and Applications, 61 (02): : 158 - 169