A Comparison of Two Approaches to Data Mining from Imbalanced Data

被引:0
|
作者
Jerzy W. Grzymala-Busse
Jerzy Stefanowski
Szymon Wilk
机构
[1] University of Kansas,Department of Electrical Engineering and Computer Science
[2] Polish Academy of Sciences,Institute of Computer Science
[3] Poznan University of Technology,Institute of Computing Science
来源
关键词
Data mining; EXPLORE rule induction algorithm; imbalanced data sets; LEM2 rule induction algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Our objective is a comparison of two data mining approaches to dealing with imbalanced data sets. The first approach is based on saving the original rule set, induced by the LEM2 (Learning from Example Module) algorithm, and changing the rule strength for all rules for the smaller class (concept) during classification. In the second approach, rule induction is split: the rule set for the larger class is induced by LEM2, while the rule set for the smaller class is induced by EXPLORE, another data mining algorithm. Results of our experiments show that both approaches increase the sensitivity compared to the original LEM2. However, the difference in performance of both approaches is statistically insignificant. Thus the appropriate approach for dealing with imbalanced data sets should be selected individually for a specific data set.
引用
收藏
页码:565 / 573
页数:8
相关论文
共 50 条
  • [41] Chebyshev approaches for imbalanced data streams regression models
    Aminian, Ehsan
    Ribeiro, Rita P.
    Gama, Joao
    DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 35 (06) : 2389 - 2466
  • [42] Prediction of Depression for Undergraduate Students Based on Imbalanced Data by Using Data Mining Techniques
    Narkbunnum, Warawut
    Wisaeng, Kittipol
    APPLIED SYSTEM INNOVATION, 2022, 5 (06)
  • [43] A dynamic ensemble learning based data mining framework for medical imbalanced big data
    Rithani, M.
    Kumar, R. Prasanna
    Ali, Altalbe
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [44] Data mining approaches for intrusion detection
    Lee, W
    Stolfo, SJ
    PROCEEDINGS OF THE SEVENTH USENIX SECURITY SYMPOSIUM, 1998, : 79 - 93
  • [45] MOP/GP approaches to data mining
    Nakayama, H
    MULTI-OBJECTIVE PROGRAMMING AND GOAL PROGRAMMING, 2003, : 27 - 34
  • [46] Status of HTS data mining approaches
    Böcker, A
    Schneider, G
    Teekentrup, A
    QSAR & COMBINATORIAL SCIENCE, 2004, 23 (04): : 207 - 213
  • [47] Comparison of Cluster-Based Sampling Approaches for Imbalanced Data of Crashes Involving Large Trucks
    Tahfim, Syed As-Sadeq
    Chen, Yan
    INFORMATION, 2024, 15 (03)
  • [48] Some Issues and Approaches in Data Mining
    Hong, Tzung-Pei
    NEW ASPECTS OF APPLIED INFORMATICS, BIOMEDICAL ELECTRONICS AND INFORMATICS AND COMMUNICATION, 2010, : 21 - 21
  • [49] Data mining Approaches in Manpower Evaluation
    Meng, Jun
    Chen, Xiao
    Zhu, Tianyu
    Pan, Yangyang
    APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 750 - 753
  • [50] Data mining approaches for information retrieval from genomic database
    Liu, DL
    Singh, GB
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS, AND TECHNOLOGY II, 2000, 4057 : 342 - 351