An Efficient Parallel Hybrid Feature Selection Approach for Big Data Analysis

被引:0
|
作者
Azaiz, Mohamed Amine [1 ]
Bensaber, Djamel Amar [1 ]
机构
[1] Ecole Super Informat, Sidi Bel Abbes, Algeria
关键词
Big Data Analytics; Feature Selection; Parallel Binary Particle Swarm Optimization; PARTICLE SWARM OPTIMIZATION; MUTUAL INFORMATION; ALGORITHM;
D O I
10.4018/IJSIR.308291
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification algorithms face runtime complexity due to high data dimension, especially in the context of big data. Feature selection (FS) is a technique for reducing dimensions and improving learning performance. In this paper, the authors proposed a hybrid FS algorithm for classification in the context of big data. Firstly, only the most relevant features are selected using symmetric uncertainty (SU) as a measure of correlation. The features are distributed into subsets using Apache Spark to calculate SU between each feature and target class in parallel. Then a Binary PSO (BPSO) algorithm is used to find the optimal FS. The BPSO has limited convergence and restricted inertial weight adjustment, so the authors suggested using a multiple inertia weight strategy to influence the changes in particle motions so that the search process is more varied. Also, the authors proposed a parallel fitness evaluation for particles under Spark to accelerate the algorithm. The results showed that the proposed FS achieved higher classification performance with a smaller size in reasonable time.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] RETRACTED ARTICLE: A hybrid metaheuristic approach for efficient feature selection methods in big data
    S. Meera
    C. Sundar
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 3743 - 3751
  • [2] Retraction Note to: A hybrid metaheuristic approach for efficient feature selection methods in big data
    S. Meera
    C. Sundar
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (Suppl 1) : 13 - 13
  • [3] RETRACTED: A hybrid metaheuristic approach for efficient feature selection methods in big data (Retracted Article)
    Meera, S.
    Sundar, C.
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (03) : 3743 - 3751
  • [4] Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems
    Tareq Abed Mohammed
    Oguz Bayat
    Osman N. Uçan
    Shaymaa Alhayali
    [J]. Foundations of Science, 2020, 25 : 1009 - 1025
  • [5] Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems
    Mohammed, Tareq Abed
    Bayat, Oguz
    Ucan, Osman N.
    Alhayali, Shaymaa
    [J]. FOUNDATIONS OF SCIENCE, 2020, 25 (04) : 1009 - 1025
  • [6] Big data analysis using a parallel ensemble clustering architecture and an unsupervised feature selection approach
    Wang, Yubo
    Saraswat, Shelesh Krishna
    Komari, Iraj Elyasi
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (01) : 270 - 282
  • [7] Distributed Feature Selection for Efficient Economic Big Data Analysis
    Zhao, Liang
    Chen, Zhikui
    Hu, Yueming
    Min, Geyong
    Jiang, Zhaohua
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (02) : 164 - 176
  • [8] A Parallel Computing Hybrid Approach for Feature Selection
    Silva, Jorge
    Aguiar, Ana
    Silva, Fernando
    [J]. 2015 IEEE 18TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2015, : 97 - 104
  • [9] Analysis of Feature Selection and Extraction Algorithm for Loan Data: A Big Data Approach
    Attigeri, Girija
    Pai, Manohara M. M.
    Pai, Radhika M.
    [J]. 2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 2147 - 2151
  • [10] An efficient feature selection algorithm for hybrid data
    Wang, Feng
    Liang, Jiye
    [J]. NEUROCOMPUTING, 2016, 193 : 33 - 41