Denying Evolution Resampling: An Improved Method for Feature Selection on Imbalanced Data

被引:1
|
作者
Quan, Li [1 ]
Gong, Tao [1 ]
Jiang, Kaida [1 ]
机构
[1] Donghua Univ, Coll Informat Sci & Technol, Shanghai 201620, Peoples R China
基金
中国国家自然科学基金;
关键词
classification algorithms; imbalanced data; similarity measure; evolutionary process;
D O I
10.3390/electronics12153212
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced data classification is an important problem in the field of computer science. Traditional classification algorithms often experience a decrease in accuracy when the data distribution is uneven. Therefore, measures need to be taken to improve the balance of the dataset and enhance the classification accuracy of the model. We have designed a data resampling method to improve the accuracy of classification detection. This method relies on the negative selection process to constrain the data evolution process. By combining the CRITIC method with regression coefficients, we establish crossover selection probabilities for elite genes to achieve an evolutionary resampling process. Based on independent weights, the feature analysis improves by 3%. We evaluated the resampled results on publicly available datasets using traditional logistic regression with cross-validation. Compared to the other resampling models, the F1 score performance of the logistic regression five-fold cross-validation is more stable than the other methods using the two sampling results of the proposed method. The effectiveness of the proposed method is verified based on F1 score evaluation results.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Feature selection for high-dimensional imbalanced data
    Yin, Liuzhi
    Ge, Yong
    Xiao, Keli
    Wang, Xuehua
    Quan, Xiaojun
    NEUROCOMPUTING, 2013, 105 : 3 - 11
  • [22] FEATURE SELECTION FOR IMBALANCED DATASETS BASED ON IMPROVED GENETIC ALGORITHM
    Du, Limin
    Xu, Yang
    Jin, Liuqian
    DECISION MAKING AND SOFT COMPUTING, 2014, 9 : 119 - 124
  • [23] Feature Selection and Imbalanced Data Handling for Depression Detection
    Mousavian, Marzieh
    Chen, Jianhua
    Greening, Steven
    BRAIN INFORMATICS, BI 2018, 2018, 11309 : 349 - 358
  • [24] Imbalanced Data Classification Based on Feature Selection Techniques
    Ksieniewicz, Pawel
    Wozniak, Michal
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2018), PT II, 2018, 11315 : 296 - 303
  • [25] Feature Selection with High-Dimensional Imbalanced Data
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    Wald, Randall
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514
  • [26] Imbalanced Multifault Diagnosis via Improved Localized Feature Selection
    Zhou, Yu
    Gao, Lin
    Wang, Dong
    Wu, Wenhui
    Zhou, Zhiqiang
    Ye, Tingqun
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [27] SPY: a novel resampling method for improving classification performance in imbalanced data
    Xuan Tho Dang
    Dang Hung Tran
    Hirose, Osamu
    Satou, Kenji
    2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 280 - 285
  • [28] Highly Imbalanced Classification of Gout Using Data Resampling and Ensemble Method
    Si, Xiaonan
    Wang, Lei
    Xu, Wenchang
    Wang, Biao
    Cheng, Wenbo
    ALGORITHMS, 2024, 17 (03)
  • [29] On the Suitability of Combining Feature Selection and Resampling to Manage Data Complexity
    Martin-Felez, Raul
    Mollineda, Ramon A.
    CURRENT TOPICS IN ARTIFICIAL INTELLIGENCE, 2010, 5988 : 141 - +
  • [30] A NEW RESAMPLING METHOD OF IMBALANCED LARGE DATA BASED ON CLASS BOUNDARY
    Xing Sheng
    Zhai Junhai
    Wang Xiaolan
    Yuan Ming
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOL. 2, 2015, : 826 - 831