Distributed Evolutionary Feature Selection for Big Data Processing

被引:0
|
作者
Bouaguel, Waad [1 ,2 ]
Ben NCir, Chiheb Eddine [1 ,2 ]
机构
[1] Univ Jeddah, Coll Business, Jeddah, Saudi Arabia
[2] Univ Tunis, ISG, LARODEC, Tunis, Tunisia
关键词
Genetic algorithm; feature selection; island model; parallel processing;
D O I
10.1142/S2196888822500154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection has become a powerful dimensional reduction strategy and an effective tool in handling high-dimensional data. Feature selection aims to reduce the dimension of the feature space, to speed up and reduce the cost of the learning model and that by selecting the most relevant feature subset to data mining and machine learning tasks. The selection of optimal feature subset is an optimization problem that proved to be NP-hard. Metaheuristics are traditionally used to deal with NP-hard problems since they are well known for solving complex and real-world problems in reasonable period of time. Genetic algorithm (GA) is one of the most popular metaheuristics algorithms, which proved to be effective for an accurate feature selection task. However, in the last few decades, data have become progressively larger in both numbers of instances and features. This paradigm is being popularly termed as Big Data. With the tremendous growth of dataset sizes, most current feature selection algorithms and exceptionally GA become unscalable. To improve the scalability of a feature selection algorithm on big data, the distributed computing strategy is always adopted such as MapReduce model and Hadoop system. In this paper, we first present a review for the most recent works which handle the use of Parallel Genetic algorithm in large datasets. Then, we will propose a new Parallel Genetic algorithm based on the Coarse-grained parallelization model (island model). The parallelization of the process and the distribution of the partitioning of data will be performed using Hadoop system with an Amazon cluster. The performance and the scalability of the proposed method were theoretically and empirically compared to the existing feature selection methods when handling large-scale datasets and results confirm the effectiveness of our proposed method.
引用
收藏
页码:313 / 332
页数:20
相关论文
共 50 条
  • [1] Data Feature Selection Methods on Distributed Big Data Processing Platforms
    Catalkaya, Mehmet Burak
    Kalipsiz, Oya
    Aktas, Mehmet S.
    Turgut, Umut Orcun
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 133 - 138
  • [2] Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach
    Peralta, Daniel
    del Rio, Sara
    Ramirez-Gallego, Sergio
    Triguero, Isaac
    Benitez, Josem.
    Herrera, Francisco
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [3] Distributed Feature Selection for Efficient Economic Big Data Analysis
    Zhao, Liang
    Chen, Zhikui
    Hu, Yueming
    Min, Geyong
    Jiang, Zhaohua
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (02) : 164 - 176
  • [4] A distributed evolutionary multivariate discretizer for Big Data processing on Apache Spark
    Ramirez-Gallego, S.
    Garcia, S.
    Benitez, J. M.
    Herrera, F.
    [J]. SWARM AND EVOLUTIONARY COMPUTATION, 2018, 38 : 240 - 250
  • [5] Distributed Fuzzy Cognitive Maps for Feature Selection in Big Data Classification
    Haritha, K.
    Judy, M., V
    Papageorgiou, Konstantinos
    Georgiannis, Vassilis C.
    Papageorgiou, Elpiniki
    [J]. ALGORITHMS, 2022, 15 (10)
  • [6] Distributed Feature Selection for Big Data Using Fuzzy Rough Sets
    Kong, Linghe
    Qu, Wenhao
    Yu, Jiadi
    Zuo, Hua
    Chen, Guihai
    Xiong, Fei
    Pan, Shirui
    Lin, Siyu
    Qiu, Meikang
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (05) : 846 - 857
  • [7] DQPFS: Distributed quadratic programming based feature selection for big data
    Soheili, Majid
    Eftekhari-Moghadam, Amir Masoud
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 138 : 1 - 14
  • [8] A STUDY ON FEATURE SELECTION IN BIG DATA
    Manikandan, R. P. S.
    Kalpana, A. M.
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,
  • [9] Feature Selection and Evolutionary Rule Learning for Big Data in Smart Building Energy Management
    Rodriguez-Mier, Pablo
    Mucientes, Manuel
    Bugarin, Alberto
    [J]. COGNITIVE COMPUTATION, 2019, 11 (03) : 418 - 433
  • [10] Feature Selection and Evolutionary Rule Learning for Big Data in Smart Building Energy Management
    Pablo Rodriguez-Mier
    Manuel Mucientes
    Alberto Bugarín
    [J]. Cognitive Computation, 2019, 11 : 418 - 433