Evaluation of decision forests on text categorization

被引:0
|
作者
Chen, H [1 ]
Ho, TK [1 ]
机构
[1] Univ Calif Berkeley, Sch Informat Mgmt & Systems, Berkeley, CA 94720 USA
来源
关键词
text categorization; decision forest; decision tree; C4.5; k-nearest-neighbor; OHSUMED; Reuters; evaluation; information retrieval;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text categorization is useful for indexing documents for information retrieval, filtering parts for document understanding, and summarizing contents of documents of special interests. We describe a text categorization task and an experiment using documents from the Reuters and OHSUMED collections. We applied the Decision Forest classifier and compared its accuracies to those of C4.5 and kNN classifiers, using both category dependent and category independent term selection schemes. It is found that Decision Forest outperforms both C4.5 and kNN in all cases, and that category dependent term selection yields better accuracies. Performances of all three classifiers degrade from the Reuters collection to the OHSUMED collection, but Decision Forest remains to be superior.
引用
收藏
页码:191 / 199
页数:9
相关论文
共 50 条
  • [21] A Similarity Based Supervised Decision Rule for Qualitative Improvement of Text Categorization
    Basu, Tanmay
    Murthy, C. A.
    FUNDAMENTA INFORMATICAE, 2015, 141 (04) : 275 - 295
  • [22] Vectorized Secure Evaluation of Decision Forests
    Malik, Raghav
    Singhal, Vidush
    Gottfried, Benjamin
    Kulkarni, Milind
    PROCEEDINGS OF THE 42ND ACM SIGPLAN INTERNATIONAL CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '21), 2021, : 1049 - 1063
  • [23] Performance Evaluation of Text Categorization Algorithms Using an Albanian Corpus
    Trandafili, Evis
    Kote, Nelda
    Biba, Marenglen
    ADVANCES IN INTERNET, DATA & WEB TECHNOLOGIES, 2018, 17 : 537 - 547
  • [24] Learning and evaluation in the presence of class hierarchies: Application to text categorization
    Kiritchenko, Svetlana
    Matwin, Stan
    Nock, Richard
    Famili, A. Fazel
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4013 : 395 - 406
  • [25] An Evaluation of Existing and New Feature Selection Metrics in Text Categorization
    Tasci, Serafettin
    Gungor, Tunga
    23RD INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2008, : 238 - 243
  • [26] Using rough sets to construct sense type decision trees for text categorization
    Bleyberg, MZ
    Elumalai, A
    JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 19 - 24
  • [27] Web Text Categorization for Enterprise Decision Support Based on SVMs - An Application of GBODSS
    Jia, Zhijuan
    Hu, Mingsheng
    Song, Haigang
    Hong, Liu
    ADVANCES IN NEURAL NETWORKS - ISNN 2009, PT 2, PROCEEDINGS, 2009, 5552 : 753 - +
  • [28] A decision-tree-based symbolic rule induction system for text categorization
    Johnson, DE
    Oles, FJ
    Zhang, T
    Goetz, T
    IBM SYSTEMS JOURNAL, 2002, 41 (03) : 428 - 437
  • [29] Text Categorization: Implementation
    Jo, Taeho
    Studies in Big Data, 2019, 45 : 129 - 156
  • [30] Noisy text categorization
    Vinciarelli, A
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (12) : 1882 - 1895