Dynamic selection of normalization techniques using data complexity measures

被引:134
|
作者
Jain, Sukirty [1 ]
Shukla, Sanyam [1 ]
Wadhvani, Rajesh [1 ]
机构
[1] Maulana Azad Natl Inst Technol, Bhopal 462007, Madhya Pradesh, India
关键词
Data complexity; Data preprocessing; MM-max normalization; z-score normalization; Gaussian Kernel ELM; EXTREME LEARNING-MACHINE; CLASSIFIERS; SET;
D O I
10.1016/j.eswa.2018.04.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data preprocessing is an important step for designing classification model. Normalization is one of the preprocessing techniques used to handle the out-of-bounds attributes. This work develops 14 classification models using different learning algorithms for dynamic selection of normalization technique. This work extracts 12 data complexity measures for 48 datasets drawn from the KEEL dataset repository. Each of these datasets is normalized using min-max and z-score normalization technique. G-mean index is estimated for these normalized datasets using Gaussian Kernel Extreme Learning Machine (KELM) in order to determine the best-suited normalization technique. The data complexity measures along with the best suited normalization technique are used as an input for developing the aforementioned dynamic models. These models predict the best suitable normalization technique based on the estimated data complexity measures of the dataset The result shows that the model developed using Gaussian Kernel ELM (KELM) and Support Vector Machine (SVM) give promising results for most of the evaluated classification problems. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:252 / 262
页数:11
相关论文
共 50 条
  • [31] Dynamic service selection in workflows using performance data
    Walker, David W.
    Huang, Lican
    Rana, Omer F.
    Huang, Yan
    SCIENTIFIC PROGRAMMING, 2007, 15 (04) : 235 - 247
  • [32] Design of service interfaces for e-business applications using data normalization techniques
    Feuerlicht G.
    Information Systems and e-Business Management, 2005, 3 (4) : 363 - 376
  • [33] DEVELOPMENT OF J-R CURVES FROM OBSOLETE DATA USING NORMALIZATION TECHNIQUES
    WONG, R
    HERRERA, R
    ZHOU, Z
    LANDES, JD
    ENGINEERING FRACTURE MECHANICS, 1990, 37 (01) : 153 - 161
  • [34] Data Leakage Detection Using Dynamic Data Structure and Classification Techniques
    Guevara Maldonado, Cesar Byron
    INGE CUC, 2015, 11 (01) : 79 - 84
  • [35] A survey of dynamic replication and replica selection strategies based on data mining techniques in data grids
    Hamrouni, T.
    Slimani, S.
    Ben Charrada, F.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 48 : 140 - 158
  • [36] Can classification performance be predicted by complexity measures? A study using microarray data
    Moran-Fernandez, L.
    Bolon-Canedo, V.
    Alonso-Betanzos, A.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 51 (03) : 1067 - 1090
  • [37] Can classification performance be predicted by complexity measures? A study using microarray data
    L. Morán-Fernández
    V. Bolón-Canedo
    A. Alonso-Betanzos
    Knowledge and Information Systems, 2017, 51 : 1067 - 1090
  • [38] Using Supervised Complexity Measures in the Analysis of Cancer Gene Expression Data Sets
    Costa, Ivan G.
    Lorena, Ana C.
    Peres, Liciana R. M. P. y
    de Souto, Marcilio C. P.
    ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS, 2009, 5676 : 48 - +
  • [39] Guidelines for using variable selection techniques in data envelopment analysis
    Nataraja, Niranjan R.
    Johnson, Andrew L.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2011, 215 (03) : 662 - 669
  • [40] Feature Selection in Big Data using Filter Based Techniques
    Srinivas, Sumitra K.
    Kancharla, Gangadhara Rao
    2019 4TH MEC INTERNATIONAL CONFERENCE ON BIG DATA AND SMART CITY (ICBDSC), 2019, : 139 - 145