Evaluation of decision forests on text categorization

被引:0
|
作者
Chen, H [1 ]
Ho, TK [1 ]
机构
[1] Univ Calif Berkeley, Sch Informat Mgmt & Systems, Berkeley, CA 94720 USA
来源
关键词
text categorization; decision forest; decision tree; C4.5; k-nearest-neighbor; OHSUMED; Reuters; evaluation; information retrieval;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text categorization is useful for indexing documents for information retrieval, filtering parts for document understanding, and summarizing contents of documents of special interests. We describe a text categorization task and an experiment using documents from the Reuters and OHSUMED collections. We applied the Decision Forest classifier and compared its accuracies to those of C4.5 and kNN classifiers, using both category dependent and category independent term selection schemes. It is found that Decision Forest outperforms both C4.5 and kNN in all cases, and that category dependent term selection yields better accuracies. Performances of all three classifiers degrade from the Reuters collection to the OHSUMED collection, but Decision Forest remains to be superior.
引用
收藏
页码:191 / 199
页数:9
相关论文
共 50 条
  • [1] Text Categorization with Diversity Random Forests
    Yang, Chun
    Yin, Xu-Cheng
    Huang, Kaizhu
    NEURAL INFORMATION PROCESSING, ICONIP 2014, PT III, 2014, 8836 : 317 - 324
  • [2] Research of Text Categorization Model based on Random Forests
    Xue, Dashen
    Li, Fengxin
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION TECHNOLOGY CICT 2015, 2015, : 173 - 176
  • [3] Automatic categorization of fanatic text using random forests
    Klema, Jiri
    Almonayyes, Ahmad
    KUWAIT JOURNAL OF SCIENCE & ENGINEERING, 2006, 33 (02): : 1 - 18
  • [4] AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION
    APTE, C
    DAMERAU, F
    WEISS, SM
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1994, 12 (03) : 233 - 251
  • [5] An Evaluation of Statistical Approaches to Text Categorization
    Yiming Yang
    Information Retrieval, 1999, 1 (1-2): : 69 - 90
  • [6] Classification decision combination for text categorization: An experimental study
    Bi, YX
    Bell, D
    Wang, H
    Guo, GD
    Dubitzky, W
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, 3180 : 222 - 231
  • [7] An approach to text categorization with SVM and visualization aided decision
    Hu, J
    Huang, HK
    International Conference on Computing, Communications and Control Technologies, Vol 2, Proceedings, 2004, : 7 - 10
  • [8] Improving Arabic Text Categorization using Decision Trees
    Harrag, Fouzi
    El-Qawasmeh, Eyas
    Pichappan, Pit
    NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 110 - +
  • [9] An evaluation of passage-based text categorization
    Kim, J
    Kim, MH
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2004, 23 (01) : 47 - 65
  • [10] An Evaluation of Passage-Based Text Categorization
    Jinsuk Kim
    Myoung Ho Kim
    Journal of Intelligent Information Systems, 2004, 23 : 47 - 65