Feature selection for text classification: A review

被引:182
|
作者
Deng, Xuelian [1 ]
Li, Yuqing [1 ]
Weng, Jian [2 ]
Zhang, Jilian [3 ]
机构
[1] Guangxi Univ Chinese Med, Coll Publ Hlth & Management, Guangxi, Peoples R China
[2] Jinan Univ, Coll Informat Sci & Technol, Guangzhou, Guangdong, Peoples R China
[3] Jinan Univ, Coll Cyber Secur, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature Selection; Text classification; Text classifiers; Multimedia; HYBRID FEATURE-SELECTION; GENETIC ALGORITHM; SIMILARITY MEASURE; NAIVE BAYES; IMAGE; CATEGORIZATION; DISTANCE; REGRESSION;
D O I
10.1007/s11042-018-6083-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big multimedia data is heterogeneous in essence, that is, the data may be a mixture of video, audio, text, and images. This is due to the prevalence of novel applications in recent years, such as social media, video sharing, and location based services (LBS), etc. In many multimedia applications, for example, video/image tagging and multimedia recommendation, text classification techniques have been used extensively to facilitate multimedia data processing. In this paper, we give a comprehensive review on feature selection techniques for text classification. We begin by introducing some popular representation schemes for documents, and similarity measures used in text classification. Then, we review the most popular text classifiers, including Nearest Neighbor (NN) method, Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), and Neural Networks. Next, we survey four feature selection models, namely the filter, wrapper, embedded and hybrid, discussing pros and cons of the state-of-the-art feature selection approaches. Finally, we conclude the paper and give a brief introduction to some interesting feature selection work that does not belong to the four models.
引用
收藏
页码:3797 / 3816
页数:20
相关论文
共 50 条
  • [1] Feature selection for text classification: A review
    Xuelian Deng
    Yuqing Li
    Jian Weng
    Jilian Zhang
    [J]. Multimedia Tools and Applications, 2019, 78 : 3797 - 3816
  • [2] A Review on Feature Selection and Feature Extraction for Text Classification
    Shah, Foram P.
    Patel, Vibha
    [J]. PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 2264 - 2268
  • [3] Filter feature selection methods for text classification: a review
    Ming, Hong
    Heyong, Wang
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (1) : 2053 - 2091
  • [4] Filter feature selection methods for text classification: a review
    Hong Ming
    Wang Heyong
    [J]. Multimedia Tools and Applications, 2024, 83 : 2053 - 2091
  • [5] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [6] Arabic Text Classification: A Review Study on Feature Selection Methods
    Hijazi, Musab Mustafa
    Zeki, Akram
    Ismail, Amelia
    [J]. 2021 22ND INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2021, : 554 - 559
  • [7] Feature selection methods for text classification: a systematic literature review
    Julliano Trindade Pintas
    Leandro A. F. Fernandes
    Ana Cristina Bicharra Garcia
    [J]. Artificial Intelligence Review, 2021, 54 : 6149 - 6200
  • [8] Feature selection methods for text classification: a systematic literature review
    Pintas, Julliano Trindade
    Fernandes, Leandro A. F.
    Garcia, Ana Cristina Bicharra
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (08) : 6149 - 6200
  • [9] Hybrid feature selection for text classification
    Gunal, Serkan
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1296 - 1311
  • [10] Feature Selection for Ordinal Text Classification
    Baccianella, Stefano
    Esuli, Andrea
    Sebastiani, Fabrizio
    [J]. NEURAL COMPUTATION, 2014, 26 (03) : 557 - 591