Feature selection with missing data using mutual information estimators

被引:57
|
作者
Doquire, Gauthier [1 ]
Verleysen, Michel [1 ]
机构
[1] Catholic Univ Louvain, Machine Learning Grp, ICTEAM, B-1348 Louvain, Belgium
关键词
Feature selection; Missing data; Mutual information; FUNCTIONAL DATA; VALUES; IMPUTATION; REGRESSION; VARIABLES;
D O I
10.1016/j.neucom.2012.02.031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is an important preprocessing task for many machine learning and pattern recognition applications, including regression and classification. Missing data are encountered in many real-world problems and have to be considered in practice. This paper addresses the problem of feature selection in prediction problems where some occurrences of features are missing. To this end, the well-known mutual information criterion is used. More precisely, it is shown how a recently introduced nearest neighbors based mutual information estimator can be extended to handle missing data. This estimator has the advantage over traditional ones that it does not directly estimate any probability density function. Consequently, the mutual information may be reliably estimated even when the dimension of the space increases. Results on artificial as well as real-world datasets indicate that the method is able to select important features without the need for any imputation algorithm, under the assumption of missing completely at random data. Moreover, experiments show that selecting the features before imputing the data generally increases the precision of the prediction models, in particular when the proportion of missing data is high. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:3 / 11
页数:9
相关论文
共 50 条
  • [21] On Estimating Mutual Information for Feature Selection
    Schaffernicht, Erik
    Kaltenhaeuser, Robert
    Verma, Saurabh Shekhar
    Gross, Horst-Michael
    ARTIFICIAL NEURAL NETWORKS-ICANN 2010, PT I, 2010, 6352 : 362 - +
  • [22] Feature selection with dynamic mutual information
    Liu, Huawen
    Sun, Jigui
    Liu, Lei
    Zhang, Huijie
    PATTERN RECOGNITION, 2009, 42 (07) : 1330 - 1339
  • [23] Mutual Information Using Sample Variance for Text Feature Selection
    Agnihotri, Deepak
    Verma, Kesari
    Tripathi, Priyanka
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION PROCESSING (ICCIP 2017), 2017, : 39 - 44
  • [24] Feature Selection in Regression Tasks Using Conditional Mutual Information
    Latorre Carmona, Pedro
    Sotoca, Jose M.
    Pla, Filiberto
    Phoa, Frederick K. H.
    Dias, Jose Bioucas
    PATTERN RECOGNITION AND IMAGE ANALYSIS: 5TH IBERIAN CONFERENCE, IBPRIA 2011, 2011, 6669 : 224 - 231
  • [25] 2DPCA Feature Selection Using Mutual Information
    Sanguansat, Parinya
    ICCEE 2008: PROCEEDINGS OF THE 2008 INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, 2008, : 578 - 581
  • [26] Feature selection using improved mutual information for text classification
    Novovicová, J
    Malík, A
    Pudil, P
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 1010 - 1017
  • [27] A new algorithm for EEG feature selection using mutual information
    Deriche, M
    Al-Ani, A
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 1057 - 1060
  • [28] Using clustering and dynamic mutual information for topic feature selection
    Xu, Jian-min
    Wu, Shu Fang
    Zhu, Jie
    JOURNAL OF THE SOCIETY FOR INFORMATION DISPLAY, 2014, 22 (11) : 572 - 580
  • [29] A feature selection method using a fuzzy mutual information measure
    Grande, Javier
    Suarez, Maria del Rosario
    Villar, Jose Ramon
    INNOVATIONS IN HYBRID INTELLIGENT SYSTEMS, 2007, 44 : 56 - +
  • [30] Stable feature selection using copula based mutual information
    Lall, Snehalika
    Sinha, Debajyoti
    Ghosh, Abhik
    Sengupta, Debarka
    Bandyopadhyay, Sanghamitra
    PATTERN RECOGNITION, 2021, 112