A supervised term selection technique for effective text categorization

被引:13
|
作者
Basu, Tanmay [1 ]
Murthy, C. A. [1 ]
机构
[1] Indian Stat Inst, Machine Intelligence Unit, Kolkata, India
关键词
Term selection; Feature selection; Dimensionality reduction; Text categorization; Text mining; Data mining; CLASSIFICATION; ALGORITHM;
D O I
10.1007/s13042-015-0421-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Term selection methods in text categorization effectively reduce the size of the vocabulary to improve the quality of classifier. Each corpus generally contains many irrelevant and noisy terms, which eventually reduces the effectiveness of text categorization. Term selection, thus, focuses on identifying the relevant terms for each category without affecting the quality of text categorization. A new supervised term selection technique have been proposed for dimensionality reduction. The method assigns a score to each term of a corpus based on its similarity with all the categories, and then all the terms of the corpus are ranked accordingly. Subsequently the significant terms of each category are selected to create the final subset of terms irrespective of the size of the category. The performance of the proposed term selection technique is compared with the performance of nine other term selection methods for categorization of several well known text corpora using kNN and SVM classifiers. The empirical results show that the proposed method performs significantly better than the other methods in most of the cases of all the corpora.
引用
收藏
页码:877 / 892
页数:16
相关论文
共 50 条
  • [1] A supervised term selection technique for effective text categorization
    Tanmay Basu
    C. A. Murthy
    [J]. International Journal of Machine Learning and Cybernetics, 2016, 7 : 877 - 892
  • [2] Supervised term weighting for automated text categorization
    Debole, F
    Sebastiani, F
    [J]. TEXT MINING AND ITS APPLICATIONS, 2004, 138 : 81 - 97
  • [3] Uncertainty and term selection in text categorization
    Peters, CMEE
    Koster, CHA
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2003, 11 (01) : 115 - 137
  • [4] An Effective Feature Selection Method for Text Categorization
    Qiu, Xipeng
    Zhou, Jinlong
    Huang, Xuanjing
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 50 - 61
  • [5] A New Supervised Term Ranking Method for Text Categorization
    Mammadov, Musa
    Yearwood, John
    Zhao, Lei
    [J]. AI 2010: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2010, 6464 : 102 - 111
  • [6] A feature selection and classification technique for text categorization
    Girgis, MR
    Aly, AA
    [J]. INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2003, 12 (04) : 441 - 454
  • [7] A Model for Term Selection in Text Categorization Problems
    Cannas, Laura Maria
    Dessi, Nicoletta
    Dessi, Stefania
    [J]. 2012 23RD INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2012, : 169 - 173
  • [8] Supervised and Traditional Term Weighting Methods for Automatic Text Categorization
    Lan, Man
    Tan, Chew Lim
    Su, Jian
    Lu, Yue
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (04) : 721 - 735
  • [9] Effective Text Classification by a Supervised Feature Selection Approach
    Basu, Tanmay
    Murthy, C. A.
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 918 - 925
  • [10] Supervised term weighting centroid-based classifiers for text categorization
    Tam T. Nguyen
    Kuiyu Chang
    Siu Cheung Hui
    [J]. Knowledge and Information Systems, 2013, 35 : 61 - 85