Interactions between document representation and feature selection in text categorization

被引:0
|
作者
Radovanovic, Milos [1 ]
Ivanovic, Mirjana [1 ]
机构
[1] Univ Novi Sad, Fac Sci, Dept Math & Informat, Novi Sad 21000, Serbia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many studies in automated Text Categorization focus on the performance of classifiers, with or without considering feature selection methods, but almost as a rule taking into account just one document representation. Only relatively recently did detailed studies on the impact of various document representations step into the spotlight, showing that there may be statistically significant differences in classifier performance even among variations of the classical bag-of-words model. This paper examines the relationship between the idf transform and several widely used feature selection methods, in the context of Naive Bayes and Support Vector Machines classifiers, on datasets extracted from the dmoz ontology of Web-page descriptions. The described experimental study shows that the idf transform considerably effects the distribution of classification performance over feature selection reduction rates, and offers an evaluation method which permits the discovery of relationships between different document representations and feature selection methods which is independent of absolute differences in classification performance.
引用
收藏
页码:489 / 498
页数:10
相关论文
共 50 条
  • [1] Feature selection based on feature interactions with application to text categorization
    Tang, Xiaochuan
    Dai, Yuanshun
    Xiang, Yanping
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 120 : 207 - 216
  • [2] An extended document frequency metric for feature selection in text categorization
    Xu, Yan
    Wang, Bin
    Li, JinTao
    Jing, Hongfang
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 71 - +
  • [3] Document transformation for multi-label feature selection in text categorization
    Chen, Weizhu
    Yan, Jun
    Zhang, Benyu
    Chen, Zheng
    Yang, Qiang
    [J]. ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 451 - +
  • [4] Feature selection in SVM text categorization
    Taira, H
    Haruno, M
    [J]. SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), 1999, : 480 - 486
  • [5] Feature selection strategies for text categorization
    Soucy, P
    Mineau, GW
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 505 - 509
  • [6] Comparison of term frequency and document frequency based feature selection metrics in text categorization
    Azam, Nouman
    Yao, JingTao
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (05) : 4760 - 4768
  • [7] CLASS DOCUMENT FREQUENCY AS A LEARNED FEATURE FOR TEXT CATEGORIZATION
    Sharma, Anand
    Kuh, Anthony
    [J]. 2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 2988 - 2993
  • [8] A New Approach of Feature Selection for Text Categorization
    CUI Zifeng~1
    2. Department of Computer Science and Engineering
    [J]. Wuhan University Journal of Natural Sciences, 2006, (05) : 1335 - 1339
  • [9] Normalized and classified feature selection in text categorization
    Wang, XJ
    Guo, J
    Zheng, KF
    [J]. INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2005, VOLS 1 AND 2, PROCEEDINGS, 2005, : 173 - 176
  • [10] Study on Feature Selection in Finance Text Categorization
    Sun, Changqiu
    Wang, Xiaolong
    Xu, Jun
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 5077 - 5082