Feature selection based on feature interactions with application to text categorization

被引:61
|
作者
Tang, Xiaochuan [1 ,2 ]
Dai, Yuanshun [2 ]
Xiang, Yanping [2 ]
机构
[1] Chengdu Univ Technol, Sch Cyber Secur, Chengdu 610059, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; Feature interaction; Mutual information; Joint mutual information; Text categorization; MUTUAL INFORMATION; FRAMEWORK;
D O I
10.1016/j.eswa.2018.11.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is an import preprocessing approach for machine learning and text mining. It reduces the dimensions of high-dimensional data. A popular approach is based on information theoretic measures. Most of the existing methods used two- and three-dimensional mutual information terms that are ineffective in detecting higher-order feature interactions. To fill this gap, we employ two- through five-way interactions for feature selection. We first identify a relaxed assumption to decompose the mutual information-based feature selection problem into a sum of low-order interactions. A direct calculation of the decomposed interaction terms is computationally expensive. We employ five-dimensional joint mutual information, a computationally efficient measure, to estimate the interaction terms. We use the 'maximum of the minimum' nonlinear approach to avoid the overestimation of the feature significance. We also apply the proposed method to text categorization. To evaluate the performance of the proposed method, we compare it with eleven popular feature selection methods, eighteen benchmark data and seven text categorization data. Experimental results with four different types of classifiers provide concrete evidence that higher-order interactions are effective in improving feature selection methods. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:207 / 216
页数:10
相关论文
共 50 条
  • [31] An Algorithm of Feature Selection in Text Categorization Based on Gini-index
    Zhu, Wei-Dong
    Wang, Bo
    Lin, Yong-Min
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND MANAGEMENT INNOVATION, 2015, 6 : 272 - 278
  • [32] Improved Information Gain-based Feature Selection for Text Categorization
    Gao, Zhe
    Xu, Yajing
    Meng, Fanyu
    Qi, Feng
    Lin, Zhiqing
    2014 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, VEHICULAR TECHNOLOGY, INFORMATION THEORY AND AEROSPACE & ELECTRONIC SYSTEMS (VITAE), 2014,
  • [33] Lazy learner text categorization algorithm based on embedded feature selection
    Yan Peng~(1
    2.China State Information Center
    Journal of Systems Engineering and Electronics, 2009, 20 (03) : 651 - 659
  • [34] Relative term-frequency based feature selection for text categorization
    Yang, SM
    Wu, XB
    Deng, ZH
    Zhang, M
    Yang, DQ
    2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 1432 - 1436
  • [35] An empirical study of feature selection for text categorization based on term weightage
    How, BC
    Narayanan, K
    IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 599 - 602
  • [36] Temporal-based Feature Selection and Transfer Learning for Text Categorization
    Fukumoto, Fumiyo
    Suzuki, Yoshimi
    2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), 2015, : 17 - 26
  • [37] Study On Feature Selection And Weighting Based On Synonym Merge In Text Categorization
    Lu, Zhenyu
    Lin, Yongmin
    Zhao, Shuang
    Chen, Xuebin
    SECOND INTERNATIONAL CONFERENCE ON FUTURE NETWORKS: ICFN 2010, 2010, : 105 - 109
  • [38] An alternative framework for univariate filter based feature selection for text categorization
    Guru, D. S.
    Suhil, Mahamad
    Raju, Lavanya Narayana
    Kumar, N. Vinay
    PATTERN RECOGNITION LETTERS, 2018, 103 : 23 - 31
  • [39] A NOVEL EMBEDDED FEATURE SELECTION METHOD: A COMPARATIVE STUDY IN THE APPLICATION OF TEXT CATEGORIZATION
    Imani, Maryam Bahojb
    Keyvanpour, Mohammad Reza
    Azmi, Reza
    APPLIED ARTIFICIAL INTELLIGENCE, 2013, 27 (05) : 408 - 427
  • [40] Enhancement of DTP feature selection method for text categorization
    Moyotl-Hernández, E
    Jiménez-Salazar, H
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 719 - 722