Feature selection with missing data using mutual information estimators

被引:55
|
作者
Doquire, Gauthier [1 ]
Verleysen, Michel [1 ]
机构
[1] Catholic Univ Louvain, Machine Learning Grp, ICTEAM, B-1348 Louvain, Belgium
关键词
Feature selection; Missing data; Mutual information; FUNCTIONAL DATA; VALUES; IMPUTATION; REGRESSION; VARIABLES;
D O I
10.1016/j.neucom.2012.02.031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is an important preprocessing task for many machine learning and pattern recognition applications, including regression and classification. Missing data are encountered in many real-world problems and have to be considered in practice. This paper addresses the problem of feature selection in prediction problems where some occurrences of features are missing. To this end, the well-known mutual information criterion is used. More precisely, it is shown how a recently introduced nearest neighbors based mutual information estimator can be extended to handle missing data. This estimator has the advantage over traditional ones that it does not directly estimate any probability density function. Consequently, the mutual information may be reliably estimated even when the dimension of the space increases. Results on artificial as well as real-world datasets indicate that the method is able to select important features without the need for any imputation algorithm, under the assumption of missing completely at random data. Moreover, experiments show that selecting the features before imputing the data generally increases the precision of the prediction models, in particular when the proportion of missing data is high. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:3 / 11
页数:9
相关论文
共 50 条
  • [1] POLARIMETRIC SAR DATA FEATURE SELECTION USING MEASURES OF MUTUAL INFORMATION
    Tanase, R.
    Radoi, A.
    Datcu, M.
    Raducanu, D.
    [J]. 2015 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2015, : 1140 - 1143
  • [2] Unsupervised Feature Selection for Outlier Detection in Categorical Data using Mutual Information
    Suri, N. N. R. Ranga
    Murty, M. Narasimha
    Athithan, G.
    [J]. 2012 12TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2012, : 253 - 258
  • [3] Feature Selection using Mutual Information for High-dimensional Data Sets
    Nagpal, Arpita
    Gaur, Deepti
    Gaur, Seema
    [J]. SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 45 - 49
  • [4] Feature selection using a mutual information based measure
    Al-Ani, A
    Deriche, M
    [J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITON, VOL IV, PROCEEDINGS, 2002, : 82 - 85
  • [5] Feature selection using mutual information in CT colonography
    Ong, Ju Lynn
    Seghouane, Abd-Krim
    [J]. PATTERN RECOGNITION LETTERS, 2011, 32 (02) : 337 - 341
  • [6] Effective feature selection scheme using mutual information
    Huang, D
    Chow, TWS
    [J]. NEUROCOMPUTING, 2005, 63 : 325 - 343
  • [7] Feature selection using Joint Mutual Information Maximisation
    Bennasar, Mohamed
    Hicks, Yulia
    Setchi, Rossitza
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (22) : 8520 - 8532
  • [8] Using Mutual Information for Feature Selection in Programmatic Advertising
    Ciesielczyk, Michal
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2017, : 290 - 295
  • [9] Feature selection using Decomposed Mutual Information Maximization
    Macedo, Francisco
    Valadas, Rui
    Carrasquinha, Eunice
    Oliveira, M. Rosario
    Pacheco, Antonio
    [J]. NEUROCOMPUTING, 2022, 513 : 215 - 232
  • [10] Feature Selection for Text Classification Using Mutual Information
    Sel, Ilhami
    Karci, Ali
    Hanbay, Davut
    [J]. 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,