Spark Rough Hypercuboid Approach for Scalable Feature Selection

被引:20
|
作者
Luo, Chuan [1 ]
Wang, Sizhao [1 ]
Li, Tianrui [2 ]
Chen, Hongmei [2 ]
Lv, Jiancheng [1 ]
Yi, Zhang [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Sichuan, Peoples R China
[2] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Rough sets; Partitioning algorithms; Computational modeling; Approximation algorithms; Parallel processing; Heuristic algorithms; Rough hypercuboid; feature selection; parallel computing; scalability; apache spark; ATTRIBUTE REDUCTION; DEPENDENCY CALCULATION; APPROXIMATION; SETS;
D O I
10.1109/TKDE.2021.3112520
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection refers to choose an optimal non-redundant feature subset with minimal degradation of learning performance and maximal avoidance of data overfitting. The appearance of large data explosion leads to the sequential execution of algorithms are extremely time-consuming, which necessitates the scalable parallelization of algorithms by efficiently exploiting the distributed computational capabilities. In this paper, we present parallel feature selection algorithms underpinned by a rough hypercuboid approach in order to scale for the growing data volumes. Metrics in terms of rough hypercuboid are highly suitable to parallel distributed processing, and fits well with the Apache Spark cluster computing paradigm. Two data parallelism strategies, namely, vertical partitioning and horizontal partitioning, are implemented respectively to decompose the data into concurrent iterative computing streams. Experimental results on representative datasets show that our algorithms significantly faster than its original sequential counterpart while guaranteeing the quality of the results. Furthermore, the proposed algorithms are perfectly capable of exploiting the distributed-memory clusters to accomplish the computation task that fails on a single node due to the memory constraints. Parallel scalability and extensibility analysis have confirmed that our parallelization extends well to process massive amount of data and can scales well with the increase of computational nodes.
引用
下载
收藏
页码:3130 / 3144
页数:15
相关论文
共 50 条
  • [1] A Rough Hypercuboid Approach for Feature Selection in Approximation Spaces
    Maji, Pradipta
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (01) : 16 - 29
  • [2] Large-Scale Meta-Heuristic Feature Selection Based on BPSO Assisted Rough Hypercuboid Approach
    Luo, Chuan
    Wang, Sizhao
    Li, Tianrui
    Chen, Hongmei
    Lv, Jiancheng
    Yi, Zhang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10889 - 10903
  • [3] Ensemble Rough Hypercuboid Approach for Classifying Cancers
    Wei, Jin-Mao
    Wang, Shu-Qin
    Yuan, Xiao-Jie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (03) : 381 - 391
  • [4] Towards scalable fuzzy-rough feature selection
    Jensen, Richard
    Mac Parthalain, Neil
    INFORMATION SCIENCES, 2015, 323 : 1 - 15
  • [5] A graph approach for fuzzy -rough feature selection
    Chen, Jinkun
    Mi, Jusheng
    Lin, Yaojin
    FUZZY SETS AND SYSTEMS, 2020, 391 : 96 - 116
  • [6] An Efficient Fuzzy Rough Approach for Feature Selection
    Xu, Feifei
    Pan, Weiguo
    Wei, Lai
    Du, Haizhou
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, 2011, 6954 : 95 - +
  • [7] A rough sets based approach to feature selection
    Zhang, M
    Yao, JT
    NAFIPS 2004: ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY, VOLS 1AND 2: FUZZY SETS IN THE HEART OF THE CANADIAN ROCKIES, 2004, : 434 - 439
  • [8] A novel approach for feature selection using Rough Sets
    1600, Institute of Electrical and Electronics Engineers Inc., United States
  • [9] On fuzzy-rough sets approach to feature selection
    Bhatt, RB
    Gopal, M
    PATTERN RECOGNITION LETTERS, 2005, 26 (07) : 965 - 975
  • [10] A Novel Approach for Feature Selection using Rough Sets
    Yadav, Nidhika
    Chatterjee, Niladri
    2017 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATIONS AND ELECTRONICS (COMPTELIX), 2017, : 195 - 199