Efficient Multiclass Classification Using Feature Selection in High-Dimensional Datasets

被引:5
|
作者
Kumar, Ankur [1 ]
Kaur, Avinash [2 ]
Singh, Parminder [1 ,2 ]
Driss, Maha [3 ,4 ]
Boulila, Wadii [4 ,5 ]
机构
[1] Lovely Profess Univ, Sch Comp Sci & Engn, Phagwara 144001, India
[2] Univ Mohammed VI Polytech, Sch Comp Sci, Ben Guerir 43150, Morocco
[3] Prince Sultan Univ, Coll Comp & Informat Sci, Secur Engn Lab, Riyadh 12435, Saudi Arabia
[4] Univ Manouba, RIADI Lab, ENSI, Manouba 2010, Tunisia
[5] Prince Sultan Univ, Robot & Internet of Things Lab, Riyadh 12435, Saudi Arabia
关键词
K-Nearest Neighbor; Logistic Regression; Mutual Information; Sequential Forward Feature Selection; ENSEMBLE; GENERATION; DIAGNOSIS; ALGORITHM; DISEASES;
D O I
10.3390/electronics12102290
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection has become essential in classification problems with numerous features. This process involves removing redundant, noisy, and negatively impacting features from the dataset to enhance the classifier's performance. Some features are less useful than others or do not correlate with the system's evaluation, and their removal does not affect the system's performance. In most cases, removing features with a monotonically decreasing impact on the system's performance increases accuracy. Therefore, this research aims to propose a dimensionality reduction method using a feature selection technique to enhance accuracy. This paper proposes a novel feature-selection approach that combines filter and wrapper techniques to select optimal features using Mutual Information with the Sequential Forward Method and 10-fold cross-validation. Results show that the proposed algorithm can reduce features by more than 75% in datasets with large features and achieve a maximum accuracy of 97%. The algorithm outperforms or performs similarly to existing ones. The proposed algorithm could be a better option for classification problems with minimized features.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] An iterative SVM approach to feature selection and classification in high-dimensional datasets
    Liu, Dehua
    Qian, Hui
    Dai, Guang
    Zhang, Zhihua
    [J]. PATTERN RECOGNITION, 2013, 46 (09) : 2531 - 2537
  • [2] High-dimensional feature selection for genomic datasets
    Afshar, Majid
    Usefi, Hamid
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 206
  • [3] Improved PSO for Feature Selection on High-Dimensional Datasets
    Tran, Binh
    Xue, Bing
    Zhang, Mengjie
    [J]. SIMULATED EVOLUTION AND LEARNING (SEAL 2014), 2014, 8886 : 503 - 515
  • [4] Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification
    Maldonado, Sebastian
    Lopez, Julio
    [J]. APPLIED SOFT COMPUTING, 2018, 67 : 94 - 105
  • [5] Feature selection for high-dimensional classification using a competitive swarm optimizer
    Gu, Shenkai
    Cheng, Ran
    Jin, Yaochu
    [J]. SOFT COMPUTING, 2018, 22 (03) : 811 - 822
  • [6] Feature selection for high-dimensional classification using a competitive swarm optimizer
    Shenkai Gu
    Ran Cheng
    Yaochu Jin
    [J]. Soft Computing, 2018, 22 : 811 - 822
  • [7] Simultaneous Feature Selection and Classification for High-Dimensional Data
    Pai, Vriddhi
    Gupta, Subhash Chand
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT 2018), 2018, : 153 - 158
  • [8] Efficient feature selection filters for high-dimensional data
    Ferreira, Artur J.
    Figueiredo, Mario A. T.
    [J]. PATTERN RECOGNITION LETTERS, 2012, 33 (13) : 1794 - 1804
  • [9] Efficient Learning and Feature Selection in High-Dimensional Regression
    Ting, Jo-Anne
    D'Souza, Aaron
    Vijayakumar, Sethu
    Schaal, Stefan
    [J]. NEURAL COMPUTATION, 2010, 22 (04) : 831 - 886
  • [10] A hybrid algorithm for feature subset selection in high-dimensional datasets using FICA and IWSSr algorithm
    Moradkhani, Mostafa
    Amiri, Ali
    Javaherian, Mohsen
    Safari, Hossein
    [J]. APPLIED SOFT COMPUTING, 2015, 35 : 123 - 135