Feature selection using a sinusoidal sequence combined with mutual information

被引:6
|
作者
Yuan, Gaoteng [1 ]
Lu, Lu [2 ]
Zhou, Xiaofeng [1 ]
机构
[1] Hohai Univ, Coll Comp & Informat, Nanjing 211100, Peoples R China
[2] Lib Nanjing Forestry Univ, Nanjing 210037, Peoples R China
关键词
Feature selection; Mutual information; Sinusoidal sequence; High-dimensional data; SSMI algorithm; FEATURE-EXTRACTION; CLASSIFICATION; ALGORITHM; FILTER;
D O I
10.1016/j.engappai.2023.107168
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data classification is the most common task in machine learning, and feature selection is the key step in the classification task. Common feature selection methods mainly analyze the maximum correlation and minimum redundancy between feature factors and tags while ignoring the impact of the number of key features, which will inevitably lead to waste in subsequent classification training. To solve this problem, a feature selection algorithm (SSMI) based on the combination of sinusoidal sequences and mutual information is proposed. First, the mutual information between each feature and tag is calculated, and the interference information in high-dimensional data is removed according to the mutual information value. Second, a sine function is constructed, and sine ordering is carried out according to the mutual information value and feature mean value between different categories of the same feature. By adjusting the period and phase value of the sequence, the feature set with the largest difference is found, and the subset of key features is obtained. Finally, three machine learning classifiers (KNN, RF, SVM) are used to classify key feature subsets, and several feature selection algorithms (JMI, mRMR, CMIM, SFS, etc.) are compared to verify the advantages and disadvantages of different algorithms. Compared with other feature selection methods, the SSMI algorithm obtains the least number of key features, with an average reduction of 15 features. The average classification accuracy has been improved by 3% on the KNN classifier. On the HBV and SDHR datasets, the SSMI algorithm achieved classification accuracy of 81.26% and 83.12%, with sensitivity and specificity results of 76.28%, 87.39% and 68.14%, 86.11%, respectively. This shows that the SSMI algorithm can achieve higher classification accuracy with a smaller feature subset.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] A new algorithm for EEG feature selection using mutual information
    Deriche, M
    Al-Ani, A
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 1057 - 1060
  • [22] Using clustering and dynamic mutual information for topic feature selection
    Xu, Jian-min
    Wu, Shu Fang
    Zhu, Jie
    JOURNAL OF THE SOCIETY FOR INFORMATION DISPLAY, 2014, 22 (11) : 572 - 580
  • [23] A feature selection method using a fuzzy mutual information measure
    Grande, Javier
    Suarez, Maria del Rosario
    Villar, Jose Ramon
    INNOVATIONS IN HYBRID INTELLIGENT SYSTEMS, 2007, 44 : 56 - +
  • [24] Feature selection with missing data using mutual information estimators
    Doquire, Gauthier
    Verleysen, Michel
    NEUROCOMPUTING, 2012, 90 : 3 - 11
  • [25] Stable feature selection using copula based mutual information
    Lall, Snehalika
    Sinha, Debajyoti
    Ghosh, Abhik
    Sengupta, Debarka
    Bandyopadhyay, Sanghamitra
    PATTERN RECOGNITION, 2021, 112
  • [26] Feature Selection for Chemical Sensor Arrays Using Mutual Information
    Wang, X. Rosalind
    Lizier, Joseph T.
    Nowotny, Thomas
    Berna, Amalia Z.
    Prokopenko, Mikhail
    Trowell, Stephen C.
    PLOS ONE, 2014, 9 (03):
  • [27] An optimal feature selection technique using the concept of mutual information
    Al-Ani, A
    Deriche, M
    ISSPA 2001: SIXTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2001, : 477 - 480
  • [28] Diagnosis by Support Vector Machines combined with feature selection based on mutual information
    Sun, Z.
    Xi, G.
    Yi, J.
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13E : 736 - 741
  • [29] Heterogeneous feature subset selection using mutual information-based feature transformation
    Wei, Min
    Chow, Tommy W. S.
    Chan, Rosa H. M.
    NEUROCOMPUTING, 2015, 168 : 706 - 718
  • [30] Novel Feature Selection Method using Mutual Information and Fractal Dimension
    Pham, D. T.
    Packianather, M. S.
    Garcia, M. S.
    Castellani, M.
    IECON: 2009 35TH ANNUAL CONFERENCE OF IEEE INDUSTRIAL ELECTRONICS, VOLS 1-6, 2009, : 3217 - +