Spark Rough Hypercuboid Approach for Scalable Feature Selection

被引:20
|
作者
Luo, Chuan [1 ]
Wang, Sizhao [1 ]
Li, Tianrui [2 ]
Chen, Hongmei [2 ]
Lv, Jiancheng [1 ]
Yi, Zhang [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Sichuan, Peoples R China
[2] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Rough sets; Partitioning algorithms; Computational modeling; Approximation algorithms; Parallel processing; Heuristic algorithms; Rough hypercuboid; feature selection; parallel computing; scalability; apache spark; ATTRIBUTE REDUCTION; DEPENDENCY CALCULATION; APPROXIMATION; SETS;
D O I
10.1109/TKDE.2021.3112520
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection refers to choose an optimal non-redundant feature subset with minimal degradation of learning performance and maximal avoidance of data overfitting. The appearance of large data explosion leads to the sequential execution of algorithms are extremely time-consuming, which necessitates the scalable parallelization of algorithms by efficiently exploiting the distributed computational capabilities. In this paper, we present parallel feature selection algorithms underpinned by a rough hypercuboid approach in order to scale for the growing data volumes. Metrics in terms of rough hypercuboid are highly suitable to parallel distributed processing, and fits well with the Apache Spark cluster computing paradigm. Two data parallelism strategies, namely, vertical partitioning and horizontal partitioning, are implemented respectively to decompose the data into concurrent iterative computing streams. Experimental results on representative datasets show that our algorithms significantly faster than its original sequential counterpart while guaranteeing the quality of the results. Furthermore, the proposed algorithms are perfectly capable of exploiting the distributed-memory clusters to accomplish the computation task that fails on a single node due to the memory constraints. Parallel scalability and extensibility analysis have confirmed that our parallelization extends well to process massive amount of data and can scales well with the increase of computational nodes.
引用
下载
收藏
页码:3130 / 3144
页数:15
相关论文
共 50 条
  • [31] A ROUGH SET APPROACH TO FEATURE SELECTION BASED ON SCATTER SEARCH METAHEURISTIC
    Wang Jue
    Zhang Qi
    Abdel-Rahman, Hedar
    Abdel-Monem, M. Ibrahim
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2014, 27 (01) : 157 - 168
  • [32] A ROUGH SET APPROACH TO FEATURE SELECTION BASED ON SCATTER SEARCH METAHEURISTIC
    WANG Jue
    ZHANG Qi
    ABDEL-RAHMAN Hedar
    ABDEL-MONEM M Ibrahim
    系统科学与复杂性学报(英文版), 2014, 27 (01) : 157 - 168
  • [33] A Role of (Not) Crisp Discernibility in Rough Set Approach to Numeric Feature Selection
    Slezak, Dominik
    Betlinski, Pawel
    ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS, 2012, 322 : 13 - 23
  • [34] Toward an efficient and scalable feature selection approach for internet traffic classification
    Fahad, Adil
    Tari, Zahir
    Khalil, Ibrahim
    Habib, Ibrahim
    Alnuweiri, Hussein
    COMPUTER NETWORKS, 2013, 57 (09) : 2040 - 2057
  • [35] A Spark-based Approach for High-efficiency Embedded Feature Selection
    Zhou, Fan
    Han, Zhongyang
    Zhao, Jun
    Wang, Wei
    IEEE 17TH INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP / IEEE 17TH INT CONF ON PERVAS INTELLIGENCE AND COMP / IEEE 5TH INT CONF ON CLOUD AND BIG DATA COMP / IEEE 4TH CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2019, : 36 - 43
  • [36] Scalable and Flexible Unsupervised Feature Selection
    Hu, Haojie
    Wang, Rong
    Yang, Xiaojun
    Nie, Feiping
    NEURAL COMPUTATION, 2019, 31 (03) : 517 - 537
  • [37] A Scalable Solution For Group Feature Selection
    Govindan, Priya
    Chen, Ruobing
    Scheinberg, Katya
    Srinivasan, Soundararajan
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2846 - 2848
  • [38] Some issues on scalable feature selection
    Liu, HA
    Setiono, R
    EXPERT SYSTEMS WITH APPLICATIONS, 1998, 15 (3-4) : 333 - 339
  • [39] An incremental approach to hierarchical feature selection by applying fuzzy rough set technique
    She, Yanhong
    Wu, Jinlan
    He, Xiaoli
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (SUPPL 2) : 2571 - 2598
  • [40] An incremental approach to hierarchical feature selection by applying fuzzy rough set technique
    Yanhong She
    Jinlan Wu
    Xiaoli He
    Artificial Intelligence Review, 2023, 56 : 2571 - 2598