Discriminative features for text document classification

被引:0
|
作者
K. Torkkola
机构
[1] Motorola Labs,
关键词
Dimension reduction; Linear discriminant analysis; Random transforms; Text classification;
D O I
暂无
中图分类号
学科分类号
摘要
The bag-of-words approach to text document representation typically results in vectors of the order of 5000–20,000 components as the representation of documents. To make effective use of various statistical classifiers, it may be necessary to reduce the dimensionality of this representation. We point out deficiencies in class discrimination of two popular such methods, Latent Semantic Indexing (LSI), and sequential feature selection according to some relevant criterion. As a remedy, we suggest feature transforms based on Linear Discriminant Analysis (LDA). Since LDA requires operating both with large and dense matrices, we propose an efficient intermediate dimension reduction step using either a random transform or LSI. We report good classification results with the combined feature transform on a subset of the Reuters-21578 database. Drastic reduction of the feature vector dimensionality from 5000 to 12 actually improves the classification performance.
引用
收藏
页码:301 / 308
页数:7
相关论文
共 50 条
  • [41] Text/non-text classification of connected components in document images
    Julca-Aguilar, Frank D.
    Maia, Ana L. L. M.
    Hirata, Nina S. T.
    2017 30TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 2017, : 450 - 455
  • [42] A discriminative model selection approach and its application to text classification
    Zhang, Lungan
    Jiang, Liangxiao
    Li, Chaoqun
    NEURAL COMPUTING & APPLICATIONS, 2019, 31 (04): : 1173 - 1187
  • [43] A discriminative model selection approach and its application to text classification
    Lungan Zhang
    Liangxiao Jiang
    Chaoqun Li
    Neural Computing and Applications, 2019, 31 : 1173 - 1187
  • [44] Increasing the Accuracy of Discriminative of Multinomial Bayesian Classifier in Text Classification
    Mouratis, T.
    Kotsiantis, S.
    ICCIT: 2009 FOURTH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND CONVERGENCE INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2009, : 1246 - 1251
  • [45] A hybrid generative/discriminative approach to text classification with additional information
    Fujino, Akinori
    Ueda, Naonori
    Saito, Kazumi
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 379 - 392
  • [46] Integrating Local Features into Discriminative Graphlets for Scene Classification
    Zhang, Luming
    Bian, Wei
    Song, Mingli
    Tao, Dacheng
    Liu, Xiao
    NEURAL INFORMATION PROCESSING, PT III, 2011, 7064 : 657 - +
  • [47] Explicit discriminative representation for improved classification of manifold features
    Wiliem, Arnold
    Vemulapalli, Raviteja
    Lovell, Brian C.
    PATTERN RECOGNITION LETTERS, 2016, 80 : 121 - 128
  • [48] Learning completed discriminative local features for texture classification
    Zhang, Zhong
    Liu, Shuang
    Mei, Xing
    Xiao, Baihua
    Zheng, Liang
    PATTERN RECOGNITION, 2017, 67 : 263 - 275
  • [49] Highly Discriminative Features for Phishing Email Classification by SVD
    Zareapoor, Masoumeh
    Shamsolmoali, Pourya
    Alam, M. Afshar
    INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, 2015, 339 : 649 - 656
  • [50] Discriminative Features Extraction and Classification in Sport Motion Analysis
    He, Xijian
    AGRO FOOD INDUSTRY HI-TECH, 2017, 28 (03): : 2769 - 2772