A novel feature selection framework for incomplete data

被引:0
|
作者
Guo, Cong [1 ]
Yang, Wei [1 ]
Li, Zheng [1 ]
Liu, Chun [1 ]
机构
[1] Henan Univ, Sch Comp & Informat Engn, Henan Key Lab Big Data Anal & Proc, Henan Engn Lab Spatial Informat Proc, Kaifeng 475004, Peoples R China
关键词
Feature selection; Incomplete data; ReliefF; MATRIX COMPLETION; MISSING VALUES; CLASSIFICATION;
D O I
10.1016/j.chemolab.2024.105193
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection on incomplete datasets is a challenging task. To address this challenge, existing methods first employ imputation methods to complete the dataset and then perform feature selection based on the imputed dataset. Since missing value imputation and feature selection are entirely independent, the importance of features cannot be considered during imputation. However, in real-world scenarios or datasets, different features have varying degrees of importance. To this end, we proposed a novel incomplete data feature selection framework that considers feature importance. The framework mainly consists of two alternating iterative stages: M-stage and W-stage. In the M-stage, missing values are imputed based on a given feature importance vector and multiple initial imputation results. In the W-stage, an improved reliefF algorithm is employed to learn the feature importance vector based on the imputed data. In particular, the feature importance output by the W-stage in the current iteration will be used as the input of the M-stage in the next iteration. Experimental results on artificial and real missing datasets demonstrate that the proposed method outperforms other approaches significantly.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Feature Selection and Classification of Big Data Using MapReduce Framework
    Devi, D. Renuka
    Sasikala, S.
    INTELLIGENT COMPUTING, INFORMATION AND CONTROL SYSTEMS, ICICCS 2019, 2020, 1039 : 666 - 673
  • [32] A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data
    Nuha Zamzami
    Nizar Bouguila
    Pattern Analysis and Applications, 2023, 26 : 91 - 106
  • [33] Novel and efficient method on feature selection and data classification
    Chen, Tieming
    Ma, Jixia
    Huang, Samuel H.
    Cai, Jiamei
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2012, 49 (04): : 735 - 745
  • [34] A novel feature selection approach for biomedical data classification
    Peng, Yonghong
    Wu, Zhiqing
    Jiang, Jianmin
    JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (01) : 15 - 23
  • [35] A Structure-Induced Framework for Multi-Label Feature Selection With Highly Incomplete Labels
    Xu, Tiantian
    Zhao, Long
    IEEE ACCESS, 2020, 8 (71229-71230) : 71219 - 71230
  • [36] Mixed feature selection in incomplete decision table
    Zhao, Hua
    Qin, Keyun
    KNOWLEDGE-BASED SYSTEMS, 2014, 57 : 181 - 190
  • [37] Multi-Round Random Subspace Feature Selection for Incomplete Gene Expression Data
    Pearson, Will
    Cao Truong Tran
    Zhang, Mengjie
    Xue, Bing
    2019 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2019, : 2544 - 2551
  • [38] Improving performance for classification with incomplete data using wrapper-based feature selection
    Tran C.T.
    Zhang M.
    Andreae P.
    Xue B.
    Evolutionary Intelligence, 2016, 9 (3) : 81 - 94
  • [39] Incremental feature selection for dynamic incomplete data using sub-tolerance relations
    Zhao, Jie
    Ling, Yun
    Huang, Faliang
    Wang, Jiahai
    See-To, Eric W. K.
    PATTERN RECOGNITION, 2024, 148
  • [40] MULTI-LABEL COST-SENSITIVE FEATURE SELECTION ALGORITHM IN INCOMPLETE DATA
    Huang, Qin
    Qian, Wenbin
    Shu, Wenhao
    Wu, Binglong
    Feng, Shuangshuang
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 1, 2018, : 56 - 62