Improved Filter-Based Feature Selection Using Correlation and Clustering Techniques

被引:0
|
作者
Atmakuru, Akhila [1 ]
Di Fatta, Giuseppe [2 ]
Nicosia, Giuseppe [3 ]
Badii, Atta [1 ]
机构
[1] Univ Reading, Reading, Berks, England
[2] Free Univ Bozen Bolzano, Bolzano, Italy
[3] Univ Catania, Catania, Italy
关键词
Feature Selection; Correlation; Clustering; Principal Coordinate analysis; Neural Network and High Dimensional Dataset; MUTUAL INFORMATION; RELEVANCE;
D O I
10.1007/978-3-031-53969-5_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature engineering and feature selection are essential techniques to most data science and machine learning applications, in which, respectively, raw data are transformed into features and features are selected to provide the most effective subset of features for the application. Feature selection techniques are particularly useful when dealingwith high-dimensional datasets that contain noisy and redundant data. An optimised feature subset could enhance the performance as well as the interpretability of the model. There are three types of feature selection methods, namely filter, wrapper and embedded techniques. Amongst these methods, the filter method is more efficient than the others as it is computationally less expensive and more generalised. This work presents two improved filter-based feature selection methods based on a correlation coefficient and clustering techniques. The first approach is based on feature correlation where the feature subset consists of features above a similarity threshold to identify a kind of neighbourhood for each feature. The second method uses clustering analysis on the correlation data to identify features that can be used to represent the entire cluster. The obtained feature subsets have been applied as pre-processing step for logistic regression and artificial neural networks. The performance of the proposed methods has been compared against the popular ReliefF feature selection method. The experimental analysis shows that the proposed feature selection methods provide an observable improvement in accuracy by choosing the most effective features.
引用
收藏
页码:379 / 389
页数:11
相关论文
共 50 条
  • [1] Filter-based optimization techniques for selection of feature subsets in ensemble systems
    Santana, Laura Emmanuella A. dos S.
    de Paula Canuto, Anne M.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (04) : 1622 - 1631
  • [2] An Improved Framework for Detecting Thyroid Disease Using Filter-Based Feature Selection and Stacking Ensemble
    Obaido, George
    Achilonu, Okechinyere
    Ogbuokiri, Blessing
    Amadi, Chimeremma Sandra
    Habeebullahi, Lawal
    Ohalloran, Tony
    Chukwu, Chidozie Williams
    Mienye, Ebikella Domor
    Aliyu, Mikail
    Fasawe, Olufunke
    Modupe, Ibukunola Abosede
    Omietimi, Erepamo Job
    Aruleba, Kehinde
    [J]. IEEE ACCESS, 2024, 12 : 89098 - 89112
  • [3] Filter-Based Feature selection for microarray data using Improved Binary Gravitational Search Algorithm
    Rouhi, Amirreza
    Nezamabadi-pour, Hossein
    [J]. 2018 3RD CONFERENCE ON SWARM INTELLIGENCE AND EVOLUTIONARY COMPUTATION (CSIEC2018), VOL 3, 2018, : 83 - 88
  • [4] A new filter-based Gene selection method based on dragonfly optimization and correlation-based feature selection
    Ghoneimy, Mohamed
    Nabil, Emad
    Badr, Amr
    El-Khamisy, Sherif F.
    [J]. BIOSCIENCE RESEARCH, 2019, 16 (03): : 3139 - 3154
  • [5] A filter-based feature construction and feature selection approach for classification using Genetic Programming
    Ma, Jianbin
    Gao, Xiaoying
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 196 (196)
  • [6] Performance Evaluation of Filter-based Feature Selection Techniques in Classifying Portable Executable Files
    Darshan, S. L. Shiva
    Jaidhar, C. D.
    [J]. 6TH INTERNATIONAL CONFERENCE ON SMART COMPUTING AND COMMUNICATIONS, 2018, 125 : 346 - 356
  • [7] A new improved filter-based feature selection model for high-dimensional data
    Munirathinam, Deepak Raj
    Ranganadhan, Mohanasundaram
    [J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (08): : 5745 - 5762
  • [8] Effective Threshold Estimation for Filter-based Feature Selection
    Pramokchon, Past
    Piamsa-nga, Punpiti
    [J]. 2016 20TH INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE (ICSEC), 2016,
  • [9] A new improved filter-based feature selection model for high-dimensional data
    Deepak Raj Munirathinam
    Mohanasundaram Ranganadhan
    [J]. The Journal of Supercomputing, 2020, 76 : 5745 - 5762
  • [10] Filter-based feature selection for rail defect detection
    C. Mandriota
    M. Nitti
    N. Ancona
    E. Stella
    A. Distante
    [J]. Machine Vision and Applications, 2004, 15 : 179 - 185