Spark Rough Hypercuboid Approach for Scalable Feature Selection

被引:20
|
作者
Luo, Chuan [1 ]
Wang, Sizhao [1 ]
Li, Tianrui [2 ]
Chen, Hongmei [2 ]
Lv, Jiancheng [1 ]
Yi, Zhang [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Sichuan, Peoples R China
[2] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Rough sets; Partitioning algorithms; Computational modeling; Approximation algorithms; Parallel processing; Heuristic algorithms; Rough hypercuboid; feature selection; parallel computing; scalability; apache spark; ATTRIBUTE REDUCTION; DEPENDENCY CALCULATION; APPROXIMATION; SETS;
D O I
10.1109/TKDE.2021.3112520
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection refers to choose an optimal non-redundant feature subset with minimal degradation of learning performance and maximal avoidance of data overfitting. The appearance of large data explosion leads to the sequential execution of algorithms are extremely time-consuming, which necessitates the scalable parallelization of algorithms by efficiently exploiting the distributed computational capabilities. In this paper, we present parallel feature selection algorithms underpinned by a rough hypercuboid approach in order to scale for the growing data volumes. Metrics in terms of rough hypercuboid are highly suitable to parallel distributed processing, and fits well with the Apache Spark cluster computing paradigm. Two data parallelism strategies, namely, vertical partitioning and horizontal partitioning, are implemented respectively to decompose the data into concurrent iterative computing streams. Experimental results on representative datasets show that our algorithms significantly faster than its original sequential counterpart while guaranteeing the quality of the results. Furthermore, the proposed algorithms are perfectly capable of exploiting the distributed-memory clusters to accomplish the computation task that fails on a single node due to the memory constraints. Parallel scalability and extensibility analysis have confirmed that our parallelization extends well to process massive amount of data and can scales well with the increase of computational nodes.
引用
下载
收藏
页码:3130 / 3144
页数:15
相关论文
共 50 条
  • [21] A New Approach to Exploring Rough Set Boundary Region for Feature Selection
    Li, Rong
    Qu, Yanpeng
    Deng, Ansheng
    Shen, Qiang
    Shang, Changjing
    2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,
  • [22] An approach for selective ensemble feature selection based on rough set theory
    Yang, Yong
    Wang, Guoyin
    He, Kun
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, PROCEEDINGS, 2007, 4481 : 518 - +
  • [23] New approach for feature selection based on rough set and bat algorithm
    Emary, E.
    Yamany, Waleed
    Hassanien, Aboul Ella
    2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2014, : 346 - 353
  • [24] A fast feature selection approach based on rough set boundary regions
    Lu, Zhengcai
    Qin, Zheng
    Zhang, Yongqiang
    Fang, Jun
    PATTERN RECOGNITION LETTERS, 2014, 36 : 81 - 88
  • [25] A noise-aware fuzzy rough set approach for feature selection
    Yang, Xiaoling
    Chen, Hongmei
    Li, Tianrui
    Luo, Chuan
    KNOWLEDGE-BASED SYSTEMS, 2022, 250
  • [26] A rough set approach to feature selection based on power set tree
    Chen, Yumin
    Miao, Duoqian
    Wang, Ruizhi
    Wu, Keshou
    KNOWLEDGE-BASED SYSTEMS, 2011, 24 (02) : 275 - 281
  • [27] An Approach to Feature Selection Based on Ant Colony Optimization and Rough Set
    Wu, Junyun
    Qiu, Taorong
    Wang, Lu
    Huang, Haiquan
    INTELLIGENT COMPUTING AND INFORMATION SCIENCE, PT I, 2011, 134 (0I): : 466 - 471
  • [28] A Noise-Tolerant Approach to Fuzzy-Rough Feature Selection
    Cornelis, Chris
    Jensen, Richard
    2008 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2008, : 1600 - +
  • [29] A Group Incremental Approach to Feature Selection Applying Rough Set Technique
    Liang, Jiye
    Wang, Feng
    Dang, Chuangyin
    Qian, Yuhua
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 294 - 308
  • [30] A rough set approach to feature selection based on scatter search metaheuristic
    Jue Wang
    Qi Zhang
    Hedar Abdel-Rahman
    M. Ibrahim Abdel-Monem
    Journal of Systems Science and Complexity, 2014, 27 : 157 - 168