Simultaneous feature and instance selection in big noisy data using memetic variable neighborhood search

被引:8
|
作者
Lin, Chun-Cheng [1 ,2 ,3 ]
Kang, Jia-Rong [4 ]
Liang, Yu-Lin [1 ]
Kuo, Chih-Chi [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Dept Ind Engn & Management, Hsinchu 300, Taiwan
[2] Asia Univ, Dept Business Adm, Taichung 413, Taiwan
[3] China Med Univ, China Med Univ Hosp, Dept Med Res, Taichung 404, Taiwan
[4] Tatung Univ, Dept Informat Management, Taipei 104, Taiwan
关键词
Big data analysis; Noisy data; Feature selection; Instance selection; Metaheuristic; DIFFERENTIAL EVOLUTION; GENETIC ALGORITHMS; HARMONY SEARCH; OPTIMIZATION; HYBRID; CLASSIFICATION;
D O I
10.1016/j.asoc.2021.107855
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In smart factories, the data collected by Internet-of-things sensors is enormous and includes a lot of noise and missing values. To address this big data problem, metaheuristic is one of the main approaches to data preprocessing, i.e., instance selection or feature selection before training the model. Most previous works on metaheuristic approaches rarely considered simultaneous instance selection and feature selection, and rarely focused on addressing big noisy data. Consequently, this work proposes a hybrid memetic algorithm (MA) with variable neighborhood search (VNS) to simultaneously select instances and features, in which MA performs excellently in data selection; and VNS has been shown to perform well in local search. To evaluate the performance of the proposed algorithm, this work creates simulation data by combining the datasets from the UCI with noisy data. The proposed algorithm for simultaneous feature and instance selection is adopted to reduce the simulation data, and then the reduced data is adopted to train a predictive model for later performance evaluation of model testing. As compared with other metaheuristics, the proposed algorithm achieves a balance between exploration and exploitation. Additionally, the results show that the proposed algorithm is more robust than other feature selection methods. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] A Scalable Memetic Algorithm for Simultaneous Instance and Feature Selection
    Garcia-Pedrajas, Nicolas
    de Haro-Garcia, Aida
    Perez-Rodriguez, Javier
    [J]. EVOLUTIONARY COMPUTATION, 2014, 22 (01) : 1 - 45
  • [2] Feature selection in machine learning via variable neighborhood search
    Syed, Mujahid N.
    [J]. OPTIMIZATION LETTERS, 2023, 17 (09) : 2321 - 2345
  • [3] Feature selection in machine learning via variable neighborhood search
    Mujahid N. Syed
    [J]. Optimization Letters, 2023, 17 : 2321 - 2345
  • [4] An Enhanced Memetic Algorithm for Feature Selection in Big Data Analytics with MapReduce
    Ramakrishnan, Umanesan
    Nachimuthu, Nandhagopal
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 31 (03): : 1547 - 1559
  • [5] A hybridization of clonal selection algorithm with iterated local search and variable neighborhood search for the feature selection problem
    Magdalene Marinaki
    Yannis Marinakis
    [J]. Memetic Computing, 2015, 7 : 181 - 201
  • [6] A hybridization of clonal selection algorithm with iterated local search and variable neighborhood search for the feature selection problem
    Marinaki, Magdalene
    Marinakis, Yannis
    [J]. MEMETIC COMPUTING, 2015, 7 (03) : 181 - 201
  • [7] Accelerated Pattern Search with Variable Solution Size for Simultaneous Instance Selection and Generation
    Le, Hoang Lam
    Neri, Ferrante
    Landa-Silva, Dario
    Triguero, Isaac
    [J]. PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2022, 2022, : 256 - 259
  • [8] Simultaneous instance and feature selection for improving prediction in special education data
    Villuendas-Rey, Yenny
    Rey-Benguria, Carmen
    Lytras, Miltiadis
    Yanez-Marquez, Cornelio
    Camacho-Nieto, Oscar
    [J]. PROGRAM-ELECTRONIC LIBRARY AND INFORMATION SYSTEMS, 2017, 51 (03) : 278 - 297
  • [9] High-dimensional feature selection via feature grouping: A Variable Neighborhood Search approach
    Garcia-Torres, Miguel
    Gomez-Vela, Francisco
    Melian-Batista, Belen
    Marcos Moreno-Vega, J.
    [J]. INFORMATION SCIENCES, 2016, 326 : 102 - 118
  • [10] A niching memetic algorithm for simultaneous clustering and feature selection
    Sheng, Weiguo
    Liu, Xiaohui
    Fairhurst, Michael
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (07) : 868 - 879