Stable feature selection using copula based mutual information

被引:34
|
作者
Lall, Snehalika [1 ]
Sinha, Debajyoti [2 ,3 ,4 ]
Ghosh, Abhik [5 ,6 ]
Sengupta, Debarka [7 ,8 ,9 ,10 ]
Bandyopadhyay, Sanghamitra [1 ]
机构
[1] Indian Stat Inst, Machine Intelligence Unit, Kolkata 700108, W Bengal, India
[2] Indian Stat Inst, SyMeC Data Ctr, Kolkata 700108, W Bengal, India
[3] Univ Nantes, CHU Nantes, INSERM, Ctr Rech Transplantat Ixmmunol,ITUN,UMR1064, Nantes, France
[4] Univ Calcutta, Dept Comp Sci & Engn, Kolkata 700098, W Bengal, India
[5] Indian Stat Inst, Interdisciplinary Stat Res Unit, Kolkata 700108, W Bengal, India
[6] Indian Stat Inst, Ctr Artificial Intelligence & Machine Learning, Kolkata 700108, W Bengal, India
[7] Indraprastha Inst Informat Technol, Ctr Computat Biol, Phase 3, New Delhi 110020, India
[8] Indraprastha Inst Informat Technol, Dept Comp Sci & Engn, Phase 3, New Delhi 110020, India
[9] Indraprastha Inst Informat Technol, Ctr Artificial Intelligence, Phase 3, New Delhi 110020, India
[10] Queensland Univ Technol, Inst Hlth & Biomed Innovat, Brisbane, Qld, Australia
关键词
Copula; Feature selection; Mutual information; Stability; Classification accuracy;
D O I
10.1016/j.patcog.2020.107697
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is a key step in many machine learning tasks. A majority of the existing methods of feature selection address the problem by devising some scoring function while treating the features independently, thereby overlooking their interdependencies. We leverage the scale invariance property of copula to construct a greedy, supervised feature selection algorithm that maximizes the feature relevance while minimizing the redundant information content. Multivariate copula is used in the proposed copula Based Feature Selection (CBFS) to discover the dependence structure between features. The incorporation of copula-based multivariate dependency in the formulation of mutual information helps avoid averaging over multiple instances of bivariate dependencies, thus eliminating the average estimation error introduced when bivariate dependency is used between a pair of feature variables. Under a controlled setting, our algorithm outperformed the existing best practice methods in warding off the noise in data. On several real and synthetic datasets, the proposed algorithm performed competitively in maximizing classification accuracy. CBFS also outperforms the other methods in terms of its noise tolerance property. (c) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Feature selection using a mutual information based measure
    Al-Ani, A
    Deriche, M
    [J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITON, VOL IV, PROCEEDINGS, 2002, : 82 - 85
  • [2] Conditional Mutual Information based Feature Selection
    Cheng, Hongrong
    Qin, Zhiguang
    Qian, Weizhong
    Liu, Wei
    [J]. KAM: 2008 INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING, PROCEEDINGS, 2008, : 103 - 107
  • [3] A wrapper for feature selection based on mutual information
    Huang, Jinjie
    Cai, Yunze
    Xu, Xiaoming
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 618 - +
  • [4] Feature Selection and Discretization based on Mutual Information
    Sharmin, Sadia
    Ali, Amin Ahsan
    Khan, Muhammad Asif Hossain
    Shoyaib, Mohammad
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2017,
  • [5] PCA based on mutual information for feature selection
    [J]. Fan, X.-L. (fanxueli@mail.ioa.ac.cn), 1600, Northeast University (28):
  • [6] Heterogeneous feature subset selection using mutual information-based feature transformation
    Wei, Min
    Chow, Tommy W. S.
    Chan, Rosa H. M.
    [J]. NEUROCOMPUTING, 2015, 168 : 706 - 718
  • [7] Vine copula selection using mutual information for hydrological dependence modeling
    Ni, Lingling
    Wang, Dong
    Wu, Jianfeng
    Wang, Yuankun
    Tao, Yuwei
    Zhang, Jianyun
    Liu, Jiufu
    Xie, Fei
    [J]. ENVIRONMENTAL RESEARCH, 2020, 186 (186)
  • [8] Feature selection using mutual information in CT colonography
    Ong, Ju Lynn
    Seghouane, Abd-Krim
    [J]. PATTERN RECOGNITION LETTERS, 2011, 32 (02) : 337 - 341
  • [9] Using Mutual Information for Feature Selection in Programmatic Advertising
    Ciesielczyk, Michal
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2017, : 290 - 295
  • [10] Feature selection using Joint Mutual Information Maximisation
    Bennasar, Mohamed
    Hicks, Yulia
    Setchi, Rossitza
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (22) : 8520 - 8532