Feature Transformation and Reduction for Text Classification

被引:0
|
作者
Ferreira, Artur J. [1 ,3 ]
Figueiredo, Mario A. T. [2 ,3 ]
机构
[1] Inst Super Engn Lisboa, Lisbon, Portugal
[2] Inst Super Tecn, Lisbon, Portugal
[3] Inst Telecomunicacoes, Lisbon, Portugal
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is an important tool for many applications, in supervised, semi-supervised, and unsupervised scenarios. In order to be processed by machine learning methods, a text (document) is usually represented as a bag of-words (BoW). A BoW is a large vector of features (usually stored as floating point values), which represent the relative frequency of occurrence of a given word/term in each document. Typically, we have a large number of features, many of which may be non-informative for classification tasks and thus the need for feature transformation, reduction, and selection arises. In this paper, we propose two efficient algorithms for feature transformation and reduction for BoW-like representations. The proposed algorithms rely on simple statistical analysis of the input pattern, exploiting the BoW and its binary version. The algorithms are evaluated with support vector machine (SVM) and AdaBoost classifiers on standard benchmark datasets. The experimental results show the adequacy of the reduced/transformed binary features for text classification problems as well as the improvement on the test set error rate, using the proposed methods.
引用
收藏
页码:72 / 81
页数:10
相关论文
共 50 条
  • [1] Feature reduction methods for text classification
    Wu, Di
    Zhang, Yaping
    Wang, Xin
    [J]. Journal of Computational Information Systems, 2008, 4 (02): : 495 - 502
  • [2] Method of Feature Reduction in Short Text Classification Based on Feature Clustering
    Li, Fangfang
    Yin, Yao
    Shi, Jinjing
    Mao, Xingliang
    Shi, Ronghua
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (08):
  • [3] A feature selection algorithm with redundancy reduction for text classification
    Saleh, Sherine Nagi
    El-Sonbaty, Yasser
    [J]. 2007 22ND INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2007, : 130 - +
  • [4] The impact of OCR accuracy and feature transformation on automatic text classification
    Murata, M
    Busagala, LSP
    Ohyama, W
    Wakabayashi, T
    Kimura, F
    [J]. DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 506 - 517
  • [5] A Study on Topic Modeling for Feature Space Reduction in Text Classification
    Pfeifer, Daniel
    Leidner, Jochen L.
    [J]. FLEXIBLE QUERY ANSWERING SYSTEMS, 2019, 11529 : 403 - 412
  • [6] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [7] Feature engineering for text classification
    Scott, S
    Matwin, S
    [J]. MACHINE LEARNING, PROCEEDINGS, 1999, : 379 - 388
  • [8] A Review on Feature Selection and Feature Extraction for Text Classification
    Shah, Foram P.
    Patel, Vibha
    [J]. PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 2264 - 2268
  • [9] Feature Reduction Based on the Fusion of Spectral and Spatial Transformation for Hyperspectral Image Classification
    Hossain, Md Moazzem
    Hossain, Md Ali
    Al Mamun, Md
    Hossain, Md Mamun
    [J]. 2020 IEEE REGION 10 SYMPOSIUM (TENSYMP) - TECHNOLOGY FOR IMPACTFUL SUSTAINABLE DEVELOPMENT, 2020, : 150 - 153
  • [10] Dynamic feature selection in text classification
    Doan, Son
    Horiguchi, Susumu
    [J]. INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675