Using Cost-Sensitive Learning and Feature Selection Algorithms to Improve the Performance of Imbalanced Classification

被引:35
|
作者
Feng, Fang [1 ,2 ]
Li, Kuan-Ching [3 ]
Shen, Jun [4 ,5 ]
Zhou, Qingguo [1 ]
Yang, Xuhui [1 ]
机构
[1] Lanzhou Univ, Sch Informat Sci & Engn, Lanzhou 730000, Peoples R China
[2] Lanzhou Inst Technol, Sch Elect & Informat Engn, Lanzhou 730050, Peoples R China
[3] Providence Univ, Dept Comp Sci & Informat Engn, Taichung 43301, Taiwan
[4] Univ Wollongong, Sch Comp & Informat Technol, Wollongong, NSW 2522, Australia
[5] MIT, Res Lab Elect, Dept EE & CS, 77 Massachusetts Ave, Cambridge, MA 02139 USA
基金
中国国家自然科学基金;
关键词
Feature extraction; Support vector machines; Linear programming; Credit cards; Prediction algorithms; Indexes; Licenses; Imbalanced data; cost-sensitive; general vector machine; binary ant lion optimizer; SUPPORT VECTOR MACHINES; ENSEMBLE; CLASSIFIERS;
D O I
10.1109/ACCESS.2020.2987364
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced data problem is widely present in network intrusion detection, spam filtering, biomedical engineering, finance, science, being a challenge in many real-life data-intensive applications. Classifier bias occurs when traditional classification algorithms are used to deal with imbalanced data. As already known, the General Vector Machine (GVM) algorithm has good generalization ability, though it does not work well for the imbalanced classification. Additionally, the state-of-the-art Binary Ant Lion Optimizer (BALO) algorithm has high exploitability and fast convergence rate. Based on these facts, we have proposed in this paper a Cost-sensitive Feature selection General Vector Machine (CFGVM) algorithm based on GVM and BALO algorithms to tackle the imbalanced classification problem, delivering different cost weights to different classes of samples. In our method, the BALO algorithm determines the cost weights and extract more significant features to improve the classification performance. Experiments conducted on eleven imbalanced data sets have shown that the CFGVM algorithm significantly improves the classification performance of minority class samples. By comparing with similar algorithms and state-of-the-art algorithms, the proposed algorithm significantly outperforms in performance and produces better classification results.
引用
收藏
页码:69979 / 69996
页数:18
相关论文
共 50 条
  • [1] Improving Imbalanced Dialogue Act Classification Using Cost-Sensitive Learning
    Miyagi, Takaaki
    Endo, Satoshi
    [J]. 2022 JOINT 12TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS AND 23RD INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (SCIS&ISIS), 2022,
  • [2] Cost-Sensitive Ensemble Learning for Highly Imbalanced Classification
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    [J]. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1427 - 1434
  • [3] Performance and model complexity on imbalanced datasets using resampling and cost-sensitive algorithms
    Freitas Junior, Jairo da Silva
    Pisani, Paulo Henrique
    [J]. FOURTH INTERNATIONAL WORKSHOP ON LEARNING WITH IMBALANCED DOMAINS: THEORY AND APPLICATIONS, VOL 183, 2022, 183 : 83 - 97
  • [4] Cost-Sensitive Learning of Fuzzy Rules for Imbalanced Classification Problems Using FURIA
    Palacios, Ana
    Trawinski, Krzysztof
    Cordon, Oscar
    Sanchez, Luciano
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2014, 22 (05) : 643 - 675
  • [5] Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data
    Khan, Salman H.
    Hayat, Munawar
    Bennamoun, Mohammed
    Sohel, Ferdous A.
    Togneri, Roberto
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) : 3573 - 3587
  • [6] A Cost-Sensitive Learning Strategy for Feature Extraction from Imbalanced Data
    Braytee, Ali
    Liu, Wei
    Kennedy, Paul
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 : 78 - 86
  • [7] Cost-Sensitive Learning based on Performance Metric for Imbalanced Data
    Aurelio, Yuri Sousa
    de Almeida, Gustavo Matheus
    de Castro, Cristiano Leite
    Braga, Antonio Padua
    [J]. NEURAL PROCESSING LETTERS, 2022, 54 (04) : 3097 - 3114
  • [8] Cost-sensitive boosting for classification of imbalanced data
    Sun, Yamnin
    Kamel, Mohamed S.
    Wong, Andrew K. C.
    Wang, Yang
    [J]. PATTERN RECOGNITION, 2007, 40 (12) : 3358 - 3378
  • [9] Cost-Sensitive Learning based on Performance Metric for Imbalanced Data
    Yuri Sousa Aurelio
    Gustavo Matheus de Almeida
    Cristiano Leite de Castro
    Antonio Padua Braga
    [J]. Neural Processing Letters, 2022, 54 : 3097 - 3114
  • [10] Cost-Sensitive Latent Space Learning for Imbalanced PolSAR Image Classification
    Wu, Qian
    Hou, Biao
    Wen, Zaidao
    Ren, Zhongle
    Jiao, Licheng
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (06): : 4802 - 4817