Machine learning for mining imbalanced data

被引:0
|
作者
Arafat, Md. Yasir [1 ]
Hoque, Sabera [2 ]
Xu, Shuxiang [3 ]
Farid, Dewan Md [4 ]
机构
[1] Wipro Limited as a Technical Lead, India
[2] Computer Science and Engineering Department, United International University, Bangladesh
[3] School of Technology, Environments and Design, University of Tasmania, Australia
[4] United International University, Bangladesh
关键词
Data mining - Adaptive boosting - Machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
Mining imbalanced data, which is also known as a class imbalanced problem is one of the most enormously challenging tasks in machine learning for data mining applications. To achieve overall accurate performance in imbalanced classification employing machine learning techniques is difficult as the majority class instances always overpower the minority class instances by a huge difference. An unequal distribution is very common in real-world high dimensional datasets, where binary classification is more frequent than multi-class classification task. Most existing machine learning algorithms are more focused on classifying majority class instances while ignoring or misclassifying minority class instances. Several techniques have been introduced in the last decades for imbalanced data classification, where each of this techniques has their own advantages and disadvantages. In this paper, we have studied and compared 12 extensively imbalanced data classification methods: SMOTE, AdaBoost, RUSBoost, EUSBoost, SMOTEBoost, MSMOTEBoost, DataBoost, Easy Ensemble, BalanceCascade, OverBagging, UnderBagging, SMOTEBagging to extract their characteristics and performance on 27 imbalanced datasets. In general, the combination of both over-sampling and undersampling techniques with ensemble classifiers such as bagging and boosting achieve the highest accuracy for classifying both majority and minority class instances. Additionally, an extensive and critical review of the existing algorithms of imbalanced learning is provided with detailed discussion. According to our findings, we advise some practical suggestions based on the reviewed papers to offer further research directions for imbalanced learning. © International Association of Engineers.
引用
收藏
页码:332 / 348
相关论文
共 50 条
  • [31] An improved weighted extreme learning machine for imbalanced data classification
    Chengbo Lu
    Haifeng Ke
    Gaoyan Zhang
    Ying Mei
    Huihui Xu
    Memetic Computing, 2019, 11 : 27 - 34
  • [32] Data reduction algorithm for machine learning and data mining
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    NEW FRONTIERS IN APPLIED ARTIFICIAL INTELLIGENCE, 2008, 5027 : 276 - 285
  • [33] Machine learning for data mining, data science and data analytics
    Radhakrishna, Vangipuram
    Reddy, Gali Suresh
    Kumar, Gunupudi Rajesh
    Rao, Dammavalam Srinivasa
    Recent Advances in Computer Science and Communications, 2021, 14 (05): : 1356 - 1357
  • [34] Comparative Performance of Deep Learning and Machine Learning Algorithms on Imbalanced Handwritten Data
    Amri, A'Inur A'Fifah
    Ismail, Amelia Ritahani
    Zarir, Abdullah Ahmad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (02) : 258 - 264
  • [35] A dynamic ensemble learning based data mining framework for medical imbalanced big data
    Rithani, M.
    Kumar, R. Prasanna
    Ali, Altalbe
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [36] Data mining/machine learning methods in foodomics
    Jimenez-Carvelo, Ana M.
    Cuadros-Rodriguez, Luis
    CURRENT OPINION IN FOOD SCIENCE, 2021, 37 : 76 - 82
  • [37] Constraint Programming for Data Mining and Machine Learning
    De Raedt, Luc
    Guns, Tias
    Nijssen, Siegfried
    PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 1671 - 1675
  • [38] Overview of Data Mining Based on Machine Learning
    Zhou, Jia-Sheng
    Cai, Zhi-Yuan
    INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMMUNICATION ENGINEERING (CSCE 2015), 2015, : 51 - 56
  • [39] The Application of Machine Learning Algorithms in Data Mining
    Zhang, Wei
    2016 INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING AND COMMUNICATIONS TECHNOLOGY (IECT 2016), 2016, : 521 - 527
  • [40] Machine Learning and Data Mining in Medical Imaging
    Shen, Dinggang
    Zhang, Daoqiang
    Young, Alastair
    Parvin, Bahram
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2015, 19 (05) : 1587 - 1588