A Comparison of Two Approaches to Data Mining from Imbalanced Data

被引:0
|
作者
Jerzy W. Grzymala-Busse
Jerzy Stefanowski
Szymon Wilk
机构
[1] University of Kansas,Department of Electrical Engineering and Computer Science
[2] Polish Academy of Sciences,Institute of Computer Science
[3] Poznan University of Technology,Institute of Computing Science
来源
关键词
Data mining; EXPLORE rule induction algorithm; imbalanced data sets; LEM2 rule induction algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Our objective is a comparison of two data mining approaches to dealing with imbalanced data sets. The first approach is based on saving the original rule set, induced by the LEM2 (Learning from Example Module) algorithm, and changing the rule strength for all rules for the smaller class (concept) during classification. In the second approach, rule induction is split: the rule set for the larger class is induced by LEM2, while the rule set for the smaller class is induced by EXPLORE, another data mining algorithm. Results of our experiments show that both approaches increase the sensitivity compared to the original LEM2. However, the difference in performance of both approaches is statistically insignificant. Thus the appropriate approach for dealing with imbalanced data sets should be selected individually for a specific data set.
引用
收藏
页码:565 / 573
页数:8
相关论文
共 50 条
  • [1] A comparison of two approaches to data mining from imbalanced data
    Grzymala-Busse, JW
    Stefanowski, J
    Wilk, S
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2004, 3213 : 757 - 763
  • [2] A comparison of two approaches to data mining from imbalanced data
    Grzymala-Busse, JW
    Stefanowski, J
    Wilk, S
    JOURNAL OF INTELLIGENT MANUFACTURING, 2005, 16 (06) : 565 - 573
  • [3] Comparison of Two Main Approaches for Handling Imbalanced Data in Churn Prediction Problem
    Nam N Nguyen
    Anh T Duong
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2021, 12 (01) : 29 - 35
  • [4] Data Mining on Imbalanced Data Sets
    Gu, Qiong
    Cai, Zhihua
    Zhu, Li
    Huang, Bo
    2008 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING, 2008, : 1020 - 1024
  • [5] A hybrid system for imbalanced data mining
    Lee, Zne-Jung
    Lee, Chou-Yuan
    Chou, So-Tsung
    Ma, Wei-Ping
    Ye, Fulan
    Chen, Zhen
    MICROSYSTEM TECHNOLOGIES-MICRO-AND NANOSYSTEMS-INFORMATION STORAGE AND PROCESSING SYSTEMS, 2020, 26 (09): : 3043 - 3047
  • [6] Machine learning for mining imbalanced data
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md
    IAENG International Journal of Computer Science, 2019, 46 (02) : 332 - 348
  • [7] A hybrid system for imbalanced data mining
    Zne-Jung Lee
    Chou-Yuan Lee
    So-Tsung Chou
    Wei-Ping Ma
    Fulan Ye
    Zhen Chen
    Microsystem Technologies, 2020, 26 : 3043 - 3047
  • [8] A practical comparison on GIS data of two data mining algorithms
    Mihai, Dana
    Mocanu, Mihai
    2018 2ND EUROPEAN CONFERENCE ON ELECTRICAL ENGINEERING AND COMPUTER SCIENCE (EECS 2018), 2018, : 195 - 200
  • [9] Two density-based sampling approaches for imbalanced and overlapping data
    Mayabadi, Sima
    Saadatfar, Hamid
    KNOWLEDGE-BASED SYSTEMS, 2022, 241
  • [10] Data mining based fuzzy classification algorithm for imbalanced data
    Xu, Le
    Chow, Mo-Yuen
    Taylor, Leroy S.
    2006 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2006, : 825 - +