Feature Subset Selection for High-Dimensional, Low Sampling Size Data Classification Using Ensemble Feature Selection With a Wrapper-Based Search

被引:0
|
作者
Mandal, Ashis Kumar [1 ,2 ]
Nadim, MD. [1 ,2 ]
Saha, Hasi [2 ]
Sultana, Tangina [3 ,4 ]
Hossain, Md. Delowar [2 ,4 ]
Huh, Eui-Nam [4 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK S7N 5A2, Canada
[2] Hajee Mohammad Danesh Sci & Technol Univ, Dept Comp Sci & Engn, Dinajpur 5200, Bangladesh
[3] Hajee Mohammad Danesh Sci & Technol Univ, Dept Elect & Commun Engn, Dinajpur 5200, Bangladesh
[4] Kyung Hee Univ, Dept Comp Sci & Engn, Yongin 17104, South Korea
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Support vector machines; Metaheuristics; Search problems; Filtering algorithms; Information filters; Classification algorithms; Classification; differential evaluation; feature selection; filter approach; HDLSS data; wrapper approach; SUPPORT VECTOR MACHINES; DIFFERENTIAL EVOLUTION; OPTIMIZATION; ALGORITHMS;
D O I
10.1109/ACCESS.2024.3390684
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The identification of suitable feature subsets from High-Dimensional Low-Sample-Size (HDLSS) data is of paramount importance because this dataset often contains numerous redundant and irrelevant features, leading to poor classification performance. However, the selection of an optimal feature subset from a vast feature space creates a significant computational challenge. In the domain of HDLSS data, conventional feature selection methods often face challenges in achieving a balance between reducing the number of features and preserving high classification accuracy. Addressing these issues, the study introduces an effective framework that employs a filter and wrapper-based strategy specifically designed to address the classification challenges inherent in HDLSS data. The framework adopts a multi-step approach where ensemble feature selection integrates five filter ranking approaches: Chi-square ( chi(2) ), Gini index (GI), F-score, Mutual Information (MI), and Symmetric uncertainty (SU) to identify the top-ranking features. In the subsequent stage, a wrapper-based search method is utilized, which employs the Differential Evaluation (DE) metaheuristic algorithm as the search strategy. The fitness of feature subsets during this search is assessed based on a weighted combination of the error rate of the Support Vector Machine (SVM) classifier and the ratio of feature cardinality. The datasets, after undergoing dimensionality reduction, are then utilized to construct classification models using SVM, K-Nearest Neighbors (KNN), and Logistic Regression (LR). The approach was evaluated on 13 HDLSS datasets to assess its efficacy in selecting appropriate feature subsets and improving Classification Accuracy (ACC) analog with Area Under the Curve (AUC). Results show that the proposed ensemble with wrapper-based approach produces a smaller number of features (ranging between 2 and 9 for all datasets), while maintaining a commendable average AUC and ACC (between 98% and 100%). The comparative analysis reveals that the proposed method surpasses both ensemble feature selection and non-feature selection approaches in terms of feature reduction and ACC. Additionally, when compared to various other state-of-the-art methods, this approach demonstrates commendable performance.
引用
收藏
页码:62341 / 62357
页数:17
相关论文
共 50 条
  • [1] Feature Selection on High Dimensional Data using Wrapper Based Subset Selection
    Manikandan, G.
    Susi, E.
    Abirami, S.
    [J]. 2017 SECOND INTERNATIONAL CONFERENCE ON RECENT TRENDS AND CHALLENGES IN COMPUTATIONAL MODELS (ICRTCCM), 2017, : 320 - 325
  • [2] Ranking-based Feature Selection with Wrapper PSO Search in High-Dimensional Data Classification
    Saw, Thinzar
    Oo, Win Mar
    [J]. IAENG International Journal of Computer Science, 2023, 50 (01)
  • [3] Search space division method for wrapper feature selection on high-dimensional data classification
    Chaudhuri, Abhilasha
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 291
  • [4] Improving performance for classification with incomplete data using wrapper-based feature selection
    Tran C.T.
    Zhang M.
    Andreae P.
    Xue B.
    [J]. Evolutionary Intelligence, 2016, 9 (3) : 81 - 94
  • [5] Stability of Filter- and Wrapper-Based Feature Subset Selection
    Wald, Randall
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    [J]. 2013 IEEE 25TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2013, : 374 - 380
  • [6] Wrapper-based Feature Selection for Imbalanced Data using Binary Queuing Search Algorithm
    Thaher, Thaer
    Mafarja, Majdi
    Abdalhaq, Baker
    Chantar, Hamouda
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 318 - 323
  • [7] Wrapper-Based Feature Subset Selection for Rapid Image Information Mining
    Durbha, Surya S.
    King, Roger L.
    Younan, Nicolas H.
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2010, 7 (01) : 43 - 47
  • [8] A Novel Wrapper-Based Optimization Algorithm for the Feature Selection and Classification
    Talpur, Noureen
    Abdulkadir, Said Jadid
    Hasan, Mohd Hilmi
    Alhussian, Hitham
    Alwadain, Ayed
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 5799 - 5820
  • [9] Improving Incremental Wrapper-Based Feature Subset Selection by Using Re-ranking
    Bermejo, Pablo
    Gamez, Jose A.
    Puerta, Jose M.
    [J]. TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT I, PROCEEDINGS, 2010, 6096 : 580 - 589
  • [10] Feature selection based on dynamic crow search algorithm for high-dimensional data classification
    Jiang, He
    Yang, Ye
    Wan, Qiuying
    Dong, Yao
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 250