Dynamic selection of normalization techniques using data complexity measures

被引：134

作者：

Jain, Sukirty ^{[1
]}

Shukla, Sanyam ^{[1
]}

Wadhvani, Rajesh ^{[1
]}

机构：

[1] Maulana Azad Natl Inst Technol, Bhopal 462007, Madhya Pradesh, India

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2018年 / 106卷

关键词：

Data complexity; Data preprocessing; MM-max normalization; z-score normalization; Gaussian Kernel ELM; EXTREME LEARNING-MACHINE; CLASSIFIERS; SET;

D O I：

10.1016/j.eswa.2018.04.008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data preprocessing is an important step for designing classification model. Normalization is one of the preprocessing techniques used to handle the out-of-bounds attributes. This work develops 14 classification models using different learning algorithms for dynamic selection of normalization technique. This work extracts 12 data complexity measures for 48 datasets drawn from the KEEL dataset repository. Each of these datasets is normalized using min-max and z-score normalization technique. G-mean index is estimated for these normalized datasets using Gaussian Kernel Extreme Learning Machine (KELM) in order to determine the best-suited normalization technique. The data complexity measures along with the best suited normalization technique are used as an input for developing the aforementioned dynamic models. These models predict the best suitable normalization technique based on the estimated data complexity measures of the dataset The result shows that the model developed using Gaussian Kernel ELM (KELM) and Support Vector Machine (SVM) give promising results for most of the evaluated classification problems. (C) 2018 Elsevier Ltd. All rights reserved.

引用

页码：252 / 262

页数：11

共 50 条

[31] Dynamic service selection in workflows using performance data
Walker, David W.
Huang, Lican
Rana, Omer F.
Huang, Yan
SCIENTIFIC PROGRAMMING, 2007, 15 (04) : 235 - 247
[32] Design of service interfaces for e-business applications using data normalization techniques
Feuerlicht G.
Information Systems and e-Business Management, 2005, 3 (4) : 363 - 376
[33] DEVELOPMENT OF J-R CURVES FROM OBSOLETE DATA USING NORMALIZATION TECHNIQUES
WONG, R
HERRERA, R
ZHOU, Z
LANDES, JD
ENGINEERING FRACTURE MECHANICS, 1990, 37 (01) : 153 - 161
[34] Data Leakage Detection Using Dynamic Data Structure and Classification Techniques
Guevara Maldonado, Cesar Byron
INGE CUC, 2015, 11 (01) : 79 - 84
[35] A survey of dynamic replication and replica selection strategies based on data mining techniques in data grids
Hamrouni, T.
Slimani, S.
Ben Charrada, F.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 48 : 140 - 158
[36] Can classification performance be predicted by complexity measures? A study using microarray data
Moran-Fernandez, L.
Bolon-Canedo, V.
Alonso-Betanzos, A.
KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 51 (03) : 1067 - 1090
[37] Can classification performance be predicted by complexity measures? A study using microarray data
L. Morán-Fernández
V. Bolón-Canedo
A. Alonso-Betanzos
Knowledge and Information Systems, 2017, 51 : 1067 - 1090
[38] Using Supervised Complexity Measures in the Analysis of Cancer Gene Expression Data Sets
Costa, Ivan G.
Lorena, Ana C.
Peres, Liciana R. M. P. y
de Souto, Marcilio C. P.
ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS, 2009, 5676 : 48 - +
[39] Guidelines for using variable selection techniques in data envelopment analysis
Nataraja, Niranjan R.
Johnson, Andrew L.
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2011, 215 (03) : 662 - 669
[40] Feature Selection in Big Data using Filter Based Techniques
Srinivas, Sumitra K.
Kancharla, Gangadhara Rao
2019 4TH MEC INTERNATIONAL CONFERENCE ON BIG DATA AND SMART CITY (ICBDSC), 2019, : 139 - 145

← 1 2 3 4 5 →