Efficient feature selection using shrinkage estimators

被引:24
|
作者
Sechidis, Konstantinos [1 ]
Azzimonti, Laura [2 ]
Pocock, Adam [3 ]
Corani, Giorgio [2 ]
Weatherall, James [4 ]
Brown, Gavin [1 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
[2] Ist Dalle Molle Studi Sull Intelligenza Artificia, Manno, Switzerland
[3] Oracle Labs, Burlington, MA USA
[4] AstraZeneca, Global Med Dev, Adv Analyt Ctr, Cambridge, England
基金
英国工程与自然科学研究理事会;
关键词
Feature selection; High order feature selection; Mutual information; Shrinkage estimators; MUTUAL INFORMATION; ENTROPY; DEPENDENCIES; ALGORITHMS; INFERENCE;
D O I
10.1007/s10994-019-05795-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information theoretic feature selection methods quantify the importance of each feature by estimating mutual information terms to capture: the relevancy, the redundancy and the complementarity. These terms are commonly estimated by maximum likelihood, while an under-explored area of research is how to use shrinkage methods instead. Our work suggests a novel shrinkage method for data-efficient estimation of information theoretic terms. The small sample behaviour makes it particularly suitable for estimation of discrete distributions with large number of categories (bins). Using our novel estimators we derive a framework for generating feature selection criteria that capture any high-order feature interaction for redundancy and complementarity. We perform a thorough empirical study across datasets from diverse sources and using various evaluation measures. Our first finding is that our shrinkage based methods achieve better results, while they keep the same computational cost as the simple maximum likelihood based methods. Furthermore, under our framework we derive efficient novel high-order criteria that outperform state-of-the-art methods in various tasks.
引用
收藏
页码:1261 / 1286
页数:26
相关论文
共 50 条
  • [41] Efficient Feature Selection Method Using Contribution Ratio by Random Forest
    Murata, Ryuei
    Mishina, Yohei
    Yamauchi, Yuji
    Yamashita, Takayoshi
    Fujiyoshi, Hironobu
    2015 21ST KOREA-JAPAN JOINT WORKSHOP ON FRONTIERS OF COMPUTER VISION, 2015,
  • [42] An Efficient Traffic Classification Scheme Using Embedded Feature Selection and LightGBM
    Hua, Yanpei
    2020 INFORMATION COMMUNICATION TECHNOLOGIES CONFERENCE (ICTC), 2020, : 125 - 130
  • [43] An efficient feature selection using multi-criteria in text categorization
    Doan, S
    Horiguchi, S
    HIS'04: FOURTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, PROCEEDINGS, 2005, : 86 - 91
  • [44] An Efficient Feature Selection Algorithm for Gene Families Using NMF and ReliefF
    Liu, Kai
    Chen, Qi
    Huang, Guo-Hua
    GENES, 2023, 14 (02)
  • [45] Efficient Classification of DDoS Attacks Using an Ensemble Feature Selection Algorithm
    Singh, Khundrakpam Johnson
    De, Tanmay
    JOURNAL OF INTELLIGENT SYSTEMS, 2020, 29 (01) : 71 - 83
  • [46] Scene classification using efficient low-level feature selection
    Lee, Chu-Hui
    Hsu, Chi-Hung
    IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 675 - 679
  • [47] Efficient Breast Cancer Detection Using Sequential Feature Selection Techniques
    Mohamed, Taha Mahdy
    2015 IEEE SEVENTH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INFORMATION SYSTEMS (ICICIS), 2015, : 458 - 464
  • [48] Efficient saliency detection using convolutional neural networks with feature selection
    Cao, Feilong
    Liu, Yuehua
    Wang, Dianhui
    INFORMATION SCIENCES, 2018, 456 : 34 - 49
  • [49] An Efficient Feature Selection Technique for User Authentication using Keystroke Dynamics
    Shanmugapriya, D.
    Padmavathi, G.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2011, 11 (10): : 191 - 195
  • [50] DOUBLE BOOTSTRAP FOR SHRINKAGE ESTIMATORS
    VINOD, HD
    JOURNAL OF ECONOMETRICS, 1995, 68 (02) : 287 - 302