Feature selection for text classification: A review

被引:0
|
作者
Xuelian Deng
Yuqing Li
Jian Weng
Jilian Zhang
机构
[1] Guangxi University of Chinese Medicine,College of Public Health and Management
[2] Jinan University,College of Information Science and Technology
[3] Jinan University,College of Cyber Security
来源
关键词
Feature Selection; Text classification; Text classifiers; Multimedia;
D O I
暂无
中图分类号
学科分类号
摘要
Big multimedia data is heterogeneous in essence, that is, the data may be a mixture of video, audio, text, and images. This is due to the prevalence of novel applications in recent years, such as social media, video sharing, and location based services (LBS), etc. In many multimedia applications, for example, video/image tagging and multimedia recommendation, text classification techniques have been used extensively to facilitate multimedia data processing. In this paper, we give a comprehensive review on feature selection techniques for text classification. We begin by introducing some popular representation schemes for documents, and similarity measures used in text classification. Then, we review the most popular text classifiers, including Nearest Neighbor (NN) method, Naïve Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), and Neural Networks. Next, we survey four feature selection models, namely the filter, wrapper, embedded and hybrid, discussing pros and cons of the state-of-the-art feature selection approaches. Finally, we conclude the paper and give a brief introduction to some interesting feature selection work that does not belong to the four models.
引用
收藏
页码:3797 / 3816
页数:19
相关论文
共 50 条
  • [21] Effective feature selection technique for text classification
    Seetha, Hari
    Murty, M. Narasimha
    Saravanan, R.
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2015, 7 (03) : 165 - 184
  • [22] A feature selection and classification technique for text categorization
    Girgis, MR
    Aly, AA
    [J]. INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2003, 12 (04) : 441 - 454
  • [23] A new approach to feature selection in text classification
    Wang, Y
    Wang, XJ
    [J]. PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3814 - 3819
  • [24] Feature selection improves text classification accuracy
    不详
    [J]. IEEE INTELLIGENT SYSTEMS, 2005, 20 (06) : 75 - 75
  • [25] Composite Feature Extraction and Selection for Text Classification
    Wan, Chuan
    Wang, Yuling
    Liu, Yaoze
    Ji, Jinchao
    Feng, Guozhong
    [J]. IEEE ACCESS, 2019, 7 : 35208 - 35219
  • [26] Higher order feature selection for text classification
    Jan Bakus
    Mohamed S. Kamel
    [J]. Knowledge and Information Systems, 2006, 9 : 468 - 491
  • [27] Higher order feature selection for text classification
    Bakus, J
    Kamel, MS
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (04) : 468 - 491
  • [28] Feature selection for text classification with Naive Bayes
    Chen, Jingnian
    Huang, Houkuan
    Tian, Shengfeng
    Qu, Youli
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5432 - 5435
  • [29] A feature selection algorithm with redundancy reduction for text classification
    Saleh, Sherine Nagi
    El-Sonbaty, Yasser
    [J]. 2007 22ND INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2007, : 130 - +
  • [30] Effective Text Classification by a Supervised Feature Selection Approach
    Basu, Tanmay
    Murthy, C. A.
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 918 - 925