Discriminative features for text document classification

被引:0
|
作者
K. Torkkola
机构
[1] Motorola Labs,
来源
Formal Pattern Analysis & Applications | 2004年 / 6卷
关键词
Dimension reduction; Linear discriminant analysis; Random transforms; Text classification;
D O I
暂无
中图分类号
学科分类号
摘要
The bag-of-words approach to text document representation typically results in vectors of the order of 5000–20,000 components as the representation of documents. To make effective use of various statistical classifiers, it may be necessary to reduce the dimensionality of this representation. We point out deficiencies in class discrimination of two popular such methods, Latent Semantic Indexing (LSI), and sequential feature selection according to some relevant criterion. As a remedy, we suggest feature transforms based on Linear Discriminant Analysis (LDA). Since LDA requires operating both with large and dense matrices, we propose an efficient intermediate dimension reduction step using either a random transform or LSI. We report good classification results with the combined feature transform on a subset of the Reuters-21578 database. Drastic reduction of the feature vector dimensionality from 5000 to 12 actually improves the classification performance.
引用
收藏
页码:301 / 308
页数:7
相关论文
共 50 条
  • [21] Local discriminative graph convolutional networks for text classification
    Bolin Wang
    Yuanyuan Sun
    Yonghe Chu
    Changrong Min
    Zhihao Yang
    Hongfei Lin
    Multimedia Systems, 2023, 29 : 2363 - 2373
  • [22] PHONOLOGICAL FEATURES IN DISCRIMINATIVE CLASSIFICATION OF DYSARTHRIC SPEECH
    Rudzicz, Frank
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4605 - 4608
  • [23] Learning Discriminative and Shareable Features for Scene Classification
    Zuo, Zhen
    Wang, Gang
    Shuai, Bing
    Zhao, Lifan
    Yang, Qingxiong
    Jiang, Xudong
    COMPUTER VISION - ECCV 2014, PT I, 2014, 8689 : 552 - 568
  • [24] Highly discriminative statistical features for email classification
    Gomez, Juan Carlos
    Boiy, Erik
    Moens, Marie-Francine
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 31 (01) : 23 - 53
  • [25] Highly discriminative statistical features for email classification
    Juan Carlos Gomez
    Erik Boiy
    Marie-Francine Moens
    Knowledge and Information Systems, 2012, 31 : 23 - 53
  • [26] New Deep Spatio-Structural Features of Handwritten Text Lines for Document Age Classification
    Shivakumara, Palaiahnakote
    Das, Alloy
    Raghunandan, K. S.
    Pal, Umapada
    Blumenstein, Michael
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (09)
  • [27] Partial discriminative training for classification of overlapping classes in document analysis
    Liu, Cheng-Lin
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2008, 11 (02) : 53 - 65
  • [28] Partial discriminative training for classification of overlapping classes in document analysis
    Cheng-Lin Liu
    International Journal of Document Analysis and Recognition (IJDAR), 2008, 11
  • [29] Efficient Kernel Discriminative Geometry Preserving Projection for Document Classification
    Wang, Ziqiang
    Sun, Xia
    Qian, Xu
    PRZEGLAD ELEKTROTECHNICZNY, 2012, 88 (5B): : 56 - 59
  • [30] Text document classification using swarm intelligence
    Vizine, AL
    de Castro, LN
    Gudwin, RR
    2005 INTERNATIONAL CONFERENCE ON INTEGRATION OF KNOWLEDGE INTENSIVE MULTI-AGENT SYSTEMS: KIMAS'05: MODELING, EXPLORATION, AND ENGINEERING, 2005, : 134 - 139