Similarity of feature selection methods: An empirical study across data intensive classification tasks

被引:59
|
作者
Dessi, Nicoletta [1 ]
Pes, Barbara [1 ]
机构
[1] Univ Cagliari, Dipartimento Matemat & Informat, I-09124 Cagliari, Italy
关键词
Data mining; Knowledge discovery; Feature selection; Similarity measures; GENE SELECTION; FEATURE-EXTRACTION; PREDICTION; CANCER; ALGORITHMS; REDUCTION; SYSTEM; TUMOR;
D O I
10.1016/j.eswa.2015.01.069
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the past two decades, the dimensionality of datasets involved in machine learning and data mining applications has increased explosively. Therefore, feature selection has become a necessary step to make the analysis more manageable and to extract useful knowledge about a given domain. A large variety of feature selection techniques are available in literature, and their comparative analysis is a very difficult task. So far, few studies have investigated, from a theoretical and/or experimental point of view, the degree of similarity/dissimilarity among the available techniques, namely the extent to which they tend to produce similar results within specific application contexts. This kind of similarity analysis is of crucial importance when two or more methods are combined in an ensemble fashion: indeed the ensemble paradigm is beneficial only if the involved methods are capable of giving different and complementary representations of the considered domain. This paper gives a contribution in this direction by proposing an empirical approach to evaluate the degree of consistency among the outputs of different selection algorithms in the context of high dimensional classification tasks. Leveraging on a proper similarity index, we systematically compared the feature subsets selected by eight popular selection methods, representatives of different selection approaches, and derived a similarity trend for feature subsets of increasing size. Through an extensive experimentation involving sixteen datasets from three challenging domains (Internet advertisements, text categorization and micro-array data classification), we obtained useful insight into the pattern of agreement of the considered methods. In particular, our results revealed how multivariate selection approaches systematically produce feature subsets that overlap to a small extent with those selected by the other methods. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:4632 / 4642
页数:11
相关论文
共 50 条
  • [21] Microarray Data Classification Using Feature Selection and Regularized Methods with Sampling Methods
    Jyothi, Saddi
    Reddy, Y. Sowmya
    Lavanya, K.
    UBIQUITOUS INTELLIGENT SYSTEMS, 2022, 302 : 351 - 358
  • [22] An empirical comparison of feature reduction methods in the context of microarray data classification
    Kestler, Hans A.
    Muessel, Christoph
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, PROCEEDINGS, 2006, 4087 : 260 - 273
  • [23] An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
    Lecocke, Michael
    Hess, Kenneth
    CANCER INFORMATICS, 2006, 2 : 313 - 327
  • [24] An Empirical Study on the Performance of Rule-Based Classification by Feature Selection
    Balakrishnan, Sarojini
    Babu, M. R.
    Krishna, P. V.
    2014 WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT 2014), 2014, : 147 - +
  • [25] The optimal combination of feature selection and data discretization: An empirical study
    Tsai, Chih-Fong
    Chen, Yu-Chi
    INFORMATION SCIENCES, 2019, 505 : 282 - 293
  • [26] COMPARISON FEATURE SELECTION METHODS FOR SUBTROPICAL VEGETATION CLASSIFICATION WITH HYPERSPECTRAL DATA
    Li, Qiaosi
    Wong, Frankie Kwan Kit
    Fung, Tung
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 3693 - 3696
  • [27] Arabic Text Classification: A Review Study on Feature Selection Methods
    Hijazi, Musab Mustafa
    Zeki, Akram
    Ismail, Amelia
    2021 22ND INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2021, : 554 - 559
  • [28] Feature Selection Methods for Text Classification
    Dasgupta, Anirban
    Drineas, Petros
    Harb, Boulos
    Josifovski, Vanja
    Mahoney, Michael W.
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
  • [29] Comparative study of feature selection methods on microarray data
    Miyamoto, T
    Uchimura, S
    Hamamoto, Y
    Iizuka, N
    Oka, M
    Yamada-Okabe, H
    IEEE EMBS APBME 2003, 2003, : 82 - 83
  • [30] The Effect of Feature Selection on Phish Website Detection An Empirical Study on Robust Feature Subset Selection for Effective Classification
    Zuhair, Hiba
    Selmat, Ali
    Salleh, Mazleena
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2015, 6 (10) : 221 - 232