Feature selection with missing data using mutual information estimators

被引:57
|
作者
Doquire, Gauthier [1 ]
Verleysen, Michel [1 ]
机构
[1] Catholic Univ Louvain, Machine Learning Grp, ICTEAM, B-1348 Louvain, Belgium
关键词
Feature selection; Missing data; Mutual information; FUNCTIONAL DATA; VALUES; IMPUTATION; REGRESSION; VARIABLES;
D O I
10.1016/j.neucom.2012.02.031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is an important preprocessing task for many machine learning and pattern recognition applications, including regression and classification. Missing data are encountered in many real-world problems and have to be considered in practice. This paper addresses the problem of feature selection in prediction problems where some occurrences of features are missing. To this end, the well-known mutual information criterion is used. More precisely, it is shown how a recently introduced nearest neighbors based mutual information estimator can be extended to handle missing data. This estimator has the advantage over traditional ones that it does not directly estimate any probability density function. Consequently, the mutual information may be reliably estimated even when the dimension of the space increases. Results on artificial as well as real-world datasets indicate that the method is able to select important features without the need for any imputation algorithm, under the assumption of missing completely at random data. Moreover, experiments show that selecting the features before imputing the data generally increases the precision of the prediction models, in particular when the proportion of missing data is high. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:3 / 11
页数:9
相关论文
共 50 条
  • [31] Feature Selection for Chemical Sensor Arrays Using Mutual Information
    Wang, X. Rosalind
    Lizier, Joseph T.
    Nowotny, Thomas
    Berna, Amalia Z.
    Prokopenko, Mikhail
    Trowell, Stephen C.
    PLOS ONE, 2014, 9 (03):
  • [32] An optimal feature selection technique using the concept of mutual information
    Al-Ani, A
    Deriche, M
    ISSPA 2001: SIXTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2001, : 477 - 480
  • [33] Feature selection using a sinusoidal sequence combined with mutual information
    Yuan, Gaoteng
    Lu, Lu
    Zhou, Xiaofeng
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [34] Feature Selection Method Based on Weighted Mutual Information for Imbalanced Data
    Li, Kewen
    Yu, Mingxiao
    Liu, Lu
    Li, Timing
    Zhai, Jiannan
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2018, 28 (08) : 1177 - 1194
  • [35] Feature selection algorithm based on mutual information and lasso for microarray data
    Zhongxin W.
    Gang S.
    Jing Z.
    Jia Z.
    Gang, Sun (ahfysungang@163.com), 1600, Bentham Science Publishers B.V., P.O. Box 294, Bussum, 1400 AG, Netherlands (10): : 278 - 286
  • [36] Input feature selection for sensor data mining based on mutual information
    Huang, J. J.
    Cai, Y. Z.
    Xu, X. M.
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13E : 1203 - 1208
  • [37] Biases in feature selection with missing data
    Seijo-Pardo, Borja
    Alonso-Betanzos, Amparo
    Bennett, Kristin P.
    Bolon-Canedo, Veronica
    Josse, Julie
    Saeed, Mehreen
    Guyon, Isabelle
    NEUROCOMPUTING, 2019, 342 : 97 - 112
  • [38] Causal Feature Selection with Missing Data
    Yu, Kui
    Yang, Yajing
    Ding, Wei
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2022, 16 (04)
  • [39] Heterogeneous feature subset selection using mutual information-based feature transformation
    Wei, Min
    Chow, Tommy W. S.
    Chan, Rosa H. M.
    NEUROCOMPUTING, 2015, 168 : 706 - 718
  • [40] Novel Feature Selection Method using Mutual Information and Fractal Dimension
    Pham, D. T.
    Packianather, M. S.
    Garcia, M. S.
    Castellani, M.
    IECON: 2009 35TH ANNUAL CONFERENCE OF IEEE INDUSTRIAL ELECTRONICS, VOLS 1-6, 2009, : 3217 - +