Spark Rough Hypercuboid Approach for Scalable Feature Selection

被引:20
|
作者
Luo, Chuan [1 ]
Wang, Sizhao [1 ]
Li, Tianrui [2 ]
Chen, Hongmei [2 ]
Lv, Jiancheng [1 ]
Yi, Zhang [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Sichuan, Peoples R China
[2] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Rough sets; Partitioning algorithms; Computational modeling; Approximation algorithms; Parallel processing; Heuristic algorithms; Rough hypercuboid; feature selection; parallel computing; scalability; apache spark; ATTRIBUTE REDUCTION; DEPENDENCY CALCULATION; APPROXIMATION; SETS;
D O I
10.1109/TKDE.2021.3112520
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection refers to choose an optimal non-redundant feature subset with minimal degradation of learning performance and maximal avoidance of data overfitting. The appearance of large data explosion leads to the sequential execution of algorithms are extremely time-consuming, which necessitates the scalable parallelization of algorithms by efficiently exploiting the distributed computational capabilities. In this paper, we present parallel feature selection algorithms underpinned by a rough hypercuboid approach in order to scale for the growing data volumes. Metrics in terms of rough hypercuboid are highly suitable to parallel distributed processing, and fits well with the Apache Spark cluster computing paradigm. Two data parallelism strategies, namely, vertical partitioning and horizontal partitioning, are implemented respectively to decompose the data into concurrent iterative computing streams. Experimental results on representative datasets show that our algorithms significantly faster than its original sequential counterpart while guaranteeing the quality of the results. Furthermore, the proposed algorithms are perfectly capable of exploiting the distributed-memory clusters to accomplish the computation task that fails on a single node due to the memory constraints. Parallel scalability and extensibility analysis have confirmed that our parallelization extends well to process massive amount of data and can scales well with the increase of computational nodes.
引用
下载
收藏
页码:3130 / 3144
页数:15
相关论文
共 50 条
  • [41] A hybrid feature selection approach based on the Bayesian network classifier and Rough sets
    Pan, Li
    Zheng, Hong
    Li, Li
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, 2008, 5009 : 707 - +
  • [42] Rough set-based approach to feature selection in customer relationship management
    Tseng, Tzu-Liang
    Huang, Chun-Che
    OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2007, 35 (04): : 365 - 383
  • [43] A fuzzy rough set approach to hierarchical feature selection based on Hausdorff distance
    Qiu, Zeyu
    Zhao, Hong
    APPLIED INTELLIGENCE, 2022, 52 (10) : 11089 - 11102
  • [44] A fuzzy rough set approach to hierarchical feature selection based on Hausdorff distance
    Zeyu Qiu
    Hong Zhao
    Applied Intelligence, 2022, 52 : 11089 - 11102
  • [45] Rough set Theory-Based group incremental approach to feature selection
    Zhao, Jie
    Wu, Dai-yang
    Zhou, Yong-xin
    Liang, Jia-ming
    Wei, WenHong
    Li, Yun
    INFORMATION SCIENCES, 2024, 675
  • [46] A fuzzy rough set approach for incremental feature selection on hybrid information systems
    Zeng, Anping
    Li, Tianrui
    Liu, Dun
    Zhang, Junbo
    Chen, Hongmei
    FUZZY SETS AND SYSTEMS, 2015, 258 : 39 - 60
  • [47] An efficient feature selection based Bayesian and Rough set approach for intrusion detection
    Prasad, Mahendra
    Tripathi, Sachin
    Dahal, Keshav
    APPLIED SOFT COMPUTING, 2020, 87 (87)
  • [48] A Scalable Feature Selection and Model Updating Approach for Big Data Machine Learning
    Yang, Baijian
    Zhang, Tonglin
    2016 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD), 2016, : 146 - 151
  • [49] Feature selection based on rough set approach, wrapper approach, and binary whale optimization algorithm
    Tawhid, Mohamed A.
    Ibrahim, Abdelmonem M.
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (03) : 573 - 602
  • [50] Feature selection based on rough set approach, wrapper approach, and binary whale optimization algorithm
    Mohamed A. Tawhid
    Abdelmonem M. Ibrahim
    International Journal of Machine Learning and Cybernetics, 2020, 11 : 573 - 602