A Comparison of Two Approaches to Data Mining from Imbalanced Data

被引：0

作者：

Jerzy W. Grzymala-Busse

Jerzy Stefanowski

Szymon Wilk

机构：

[1] University of Kansas,Department of Electrical Engineering and Computer Science

[2] Polish Academy of Sciences,Institute of Computer Science

[3] Poznan University of Technology,Institute of Computing Science

来源：

Journal of Intelligent Manufacturing | 2005年 / 16卷

关键词：

Data mining; EXPLORE rule induction algorithm; imbalanced data sets; LEM2 rule induction algorithm;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Our objective is a comparison of two data mining approaches to dealing with imbalanced data sets. The first approach is based on saving the original rule set, induced by the LEM2 (Learning from Example Module) algorithm, and changing the rule strength for all rules for the smaller class (concept) during classification. In the second approach, rule induction is split: the rule set for the larger class is induced by LEM2, while the rule set for the smaller class is induced by EXPLORE, another data mining algorithm. Results of our experiments show that both approaches increase the sensitivity compared to the original LEM2. However, the difference in performance of both approaches is statistically insignificant. Thus the appropriate approach for dealing with imbalanced data sets should be selected individually for a specific data set.

引用

页码：565 / 573

页数：8

共 50 条

[21] A comparison of data mining approaches in the categorization of oral anticoagulation patients
Archetti, Francesco
Giordani, Ilaria
Messina, Enza
Ogliari, Giulia
Mari, Daniela
BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 7 - +
[22] CUSTOMER CHURN MODELS: A COMPARISON OF PROBABILITY AND DATA MINING APPROACHES
Jahromi, Ali Tamaddoni
Stakhovych, Stanislav
Ewing, Michael
LOOKING FORWARD, LOOKING BACK: DRAWING ON THE PAST TO SHAPE THE FUTURE OF MARKETING, 2016, : 144 - 148
[23] Severely imbalanced Big Data challenges: investigating data sampling approaches
Tawfiq Hasanin
Taghi M. Khoshgoftaar
Joffrey L. Leevy
Richard A. Bauder
Journal of Big Data, 6
[24] Severely imbalanced Big Data challenges: investigating data sampling approaches
Hasanin, Tawfiq
Khoshgoftaar, Taghi M.
Leevy, Joffrey L.
Bauder, Richard A.
JOURNAL OF BIG DATA, 2019, 6 (01)
[25] Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
BaniMustafa, Ahmed
ISECURE-ISC INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2019, 11 (03): : 79 - 89
[26] Improving Imbalanced Data Classification in Auto Insurance by the Data Level Approaches
Hanafy, Mohamed
Ming, Ruixing
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (06) : 493 - 499
[27] A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining
Wongvorachan, Tarid
He, Surina
Bulut, Okan
INFORMATION, 2023, 14 (01)
[28] An ensemble classifier framework for mining imbalanced data streams
Ouyang, Zhen-Zheng
Luo, Jian-Shu
Hu, Dong-Min
Wu, Quan-Yuan
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2010, 38 (01): : 184 - 189
[29] GEP-based classifier for mining imbalanced data
Jedrzejowicz, Joanna
Jedrzejowicz, Piotr
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 164
[30] Comparison of Approaches to Alleviate Problems with High-Dimensional and Class-Imbalanced Data
Abu Shanab, Ahmad
Khoshgoftaar, Taghi M.
Wald, Randall
Van Hulse, Jason
2011 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2011, : 234 - 239

← 1 2 3 4 5 →