Distributed Ensemble Feature Selection Framework for High-Dimensional and High-Skewed Imbalanced Big Dataset

被引:1
|
作者
Soheili, Majid [1 ]
Haeri, Maryam Amir Amir [2 ]
机构
[1] Islamic Azad Univ, Comp Engn Dept, Neka Branch, Neka, Iran
[2] Univ Twente, Learning Data Analyt & Technol Dept, Enschede, Netherlands
关键词
Scalable Feature Selection; Distributed Ensemble Learning; Imbalanced Big Data Set;
D O I
10.1109/SSCI50451.2021.9659937
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The class-imbalance problem emerges when the class labels of a dataset have a skewed distribution. In this circumstance, the instances belonging to one class, which is exactly the principal purpose, are dominated thoroughly by the instances belonging to other classes. In recent years, feature selection for high-dimensional imbalanced data has become attraction research scope. This technique concerns selecting an informative feature set to improve the accuracy of the classification model. Moreover, as a subcategory of feature selection, the feature ranking technique has been deliberated to cope with high-dimensional datasets in the last decade. On the one hand, most traditional feature selection methods are not scalable, which is critical to cope with large-scale datasets. On the other hand, scalability is an intrinsic characteristic of the ensemble learning approach. This paper proposes a Distributed Ensemble Imbalanced feature selection framework, called DEIM, to deal with big imbalanced datasets. The DEIM, at first, transforms default data partitions to representative partitions in a single pass. Second, it applies a feature ranking method in a bagging approach upon each partition independently. Finally, It fuses intermediate feature rankings in a stacking strategy. In this paper, two traditional feature ranking algorithms, ReliefF and QPFS, are plugged into DEIM. Therefore, two methods DEIM-Relief and DEIM-QPFS, are produced. Experiments are accomplished on three big imbalanced datasets and upon a computer cluster. The empirical study depicts that the produced methods are scalable. Also, they have lower execution times, and their final results can induce better classification models than DiReliefF and DQPFS.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] A distributed approach for accelerating sparse matrix arithmetic operations for high-dimensional feature selection
    Antonela Tommasel
    Daniela Godoy
    Alejandro Zunino
    Cristian Mateos
    [J]. Knowledge and Information Systems, 2017, 51 : 459 - 497
  • [32] A hybrid feature weighting and selection-based strategy to classify the high-dimensional and imbalanced medical data
    Harpreet Singh
    Manpreet Kaur
    Birmohan Singh
    [J]. Neural Computing and Applications, 2024, 36 (20) : 12299 - 12316
  • [33] Distributed feature selection: A hesitant fuzzy correlation concept for microarray high-dimensional datasets
    Ebrahimpour, Mohammad Kazem
    Eftekhari, Mahdi
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 173 : 51 - 64
  • [34] A Hybrid Feature Selection Algorithm Applied to High-dimensional Imbalanced Small-sample Data Classification
    Feng, Fang
    Lv, Qingquan
    Wang, Mingsong
    Yang, Xuhui
    Zhou, Qingguo
    Zhou, Rui
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 41 - 46
  • [35] Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines
    Maldonado, Sebastian
    Weber, Richard
    Famili, Fazel
    [J]. INFORMATION SCIENCES, 2014, 286 : 228 - 246
  • [36] Analysis of Ensemble Feature Selection for Correlated High-Dimensional RNA-Seq Cancer Data
    Polewko-Klim, Aneta
    Rudnicki, Witold R.
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT III, 2020, 12139 : 525 - 538
  • [37] Is Ensemble Classifier Needed for Steganalysis in High-Dimensional Feature Spaces?
    Cogranne, Remi
    Sedighi, Vahid
    Fridrich, Jessica
    Pevny, Tomas
    [J]. 2015 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2015,
  • [38] Exploiting the ensemble paradigm for stable feature selection: A case study on high-dimensional genomic data
    Pes, Barbara
    Dessi, Nicoletta
    Angioni, Marta
    [J]. INFORMATION FUSION, 2017, 35 : 132 - 147
  • [39] Classification in High-Dimensional Feature Spaces: Random Subsample Ensemble
    Serpen, Gursel
    Pathical, Santhosh
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 740 - 745
  • [40] A Hybrid Ensemble Feature Selection-Based Learning Model for COPD Prediction on High-Dimensional Feature Space
    Banda, Srinivas Raja Banda
    Babu, Tummala Ranga
    [J]. DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT-2K19, 2020, 1079 : 663 - 675