A Comparison of Two Approaches to Data Mining from Imbalanced Data

被引:0
|
作者
Jerzy W. Grzymala-Busse
Jerzy Stefanowski
Szymon Wilk
机构
[1] University of Kansas,Department of Electrical Engineering and Computer Science
[2] Polish Academy of Sciences,Institute of Computer Science
[3] Poznan University of Technology,Institute of Computing Science
来源
关键词
Data mining; EXPLORE rule induction algorithm; imbalanced data sets; LEM2 rule induction algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Our objective is a comparison of two data mining approaches to dealing with imbalanced data sets. The first approach is based on saving the original rule set, induced by the LEM2 (Learning from Example Module) algorithm, and changing the rule strength for all rules for the smaller class (concept) during classification. In the second approach, rule induction is split: the rule set for the larger class is induced by LEM2, while the rule set for the smaller class is induced by EXPLORE, another data mining algorithm. Results of our experiments show that both approaches increase the sensitivity compared to the original LEM2. However, the difference in performance of both approaches is statistically insignificant. Thus the appropriate approach for dealing with imbalanced data sets should be selected individually for a specific data set.
引用
收藏
页码:565 / 573
页数:8
相关论文
共 50 条
  • [31] Comparison of two approaches for generation of daily rainfall data
    Srikanthan, R
    Harrold, TI
    Sharma, A
    McMahon, TA
    STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2005, 19 (03) : 215 - 226
  • [32] Comparison of two approaches for generation of daily rainfall data
    R. Srikanthan
    T. I. Harrold
    A. Sharma
    T. A. McMahon
    Stochastic Environmental Research and Risk Assessment, 2005, 19 : 215 - 226
  • [33] Comparison of two approaches for generation of daily rainfall data
    Srikanthan, R
    Harrold, TI
    Sharma, A
    McMahon, TA
    MODSIM 2003: INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION, VOLS 1-4: VOL 1: NATURAL SYSTEMS, PT 1; VOL 2: NATURAL SYSTEMS, PT 2; VOL 3: SOCIO-ECONOMIC SYSTEMS; VOL 4: GENERAL SYSTEMS, 2003, : 106 - 111
  • [34] Learning from Imbalanced Data
    He, Haibo
    Garcia, Edwardo A.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) : 1263 - 1284
  • [35] Adversarial Approaches to Tackle Imbalanced Data in Machine Learning
    Ayoub, Shahnawaz
    Gulzar, Yonis
    Rustamov, Jaloliddin
    Jabbari, Abdoh
    Reegu, Faheem Ahmad
    Turaev, Sherzod
    SUSTAINABILITY, 2023, 15 (09)
  • [36] Data Sampling Approaches with Severely Imbalanced Big Data for Medicare Fraud Detection
    Bauder, Richard A.
    Khoshgoftaar, Taghi M.
    Hasanin, Tawfiq
    2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, : 137 - 142
  • [37] Apriori and GUHA - Comparing two approaches to data mining with association rules
    Rauch, Jan
    Simunek, Milan
    INTELLIGENT DATA ANALYSIS, 2017, 21 (04) : 981 - 1013
  • [38] Comparing two data mining approaches to timely assess the students collaboration
    Anaya, Antonio R.
    Boticario, Jesús G.
    IEEE Learning Technology, 2012, 14 (01): : 16 - 18
  • [39] Comparing two Data Mining Approaches to Timely Assess the Students Collaboration
    Anaya, Antonio R.
    Boticario, Jesus G.
    BULLETIN OF THE TECHNICAL COMMITTEE ON LEARNING TECHNOLOGY, 2012, 14 (01): : 16 - 18
  • [40] Chebyshev approaches for imbalanced data streams regression models
    Ehsan Aminian
    Rita P. Ribeiro
    João Gama
    Data Mining and Knowledge Discovery, 2021, 35 : 2389 - 2466