Exploiting hierarchy in text categorization

被引:75
|
作者
Weigend A.S. [1 ]
Wiener E.D. [2 ]
Pedersen J.O. [2 ]
机构
[1] Department of Information Systems, Leonard N. Stern School of Business, New York University, 44 West Fourth Street, New York
[2] InfoSeek Corp., 1399 Moffet Park Drive, Sunnyvale
来源
Information Retrieval | 1999年 / 1卷 / 3期
关键词
Hierarchical models; Information retrieval; Knowledge management; Machine learning; Neural networks; Performance evaluation; Probabilistic models; Problem decomposition; Text categorization; Text mining; Topic spotting;
D O I
10.1023/A:1009983522080
中图分类号
学科分类号
摘要
With the recent dramatic increase in electronic access to documents, text categorization - the task of assigning topics to a given document - has moved to the center of the information sciences and knowledge management. This article uses the structure that is present in the semantic space of topics in order to improve performance in text categorization: according to their meaning, topics can be grouped together into "meta-topics", e.g., gold, silver, and copper are all metals. The proposed architecture matches the hierarchical structure of the topic space, as opposed to a flat model that ignores the structure. It accommodates both single and multiple topic assignments for each document. Its probabilistic interpretation allows its predictions to be combined in a principled way with information from other sources. The first level of the architecture predicts the probabilities of the meta-topic groups. This allows the individual models for each topic on the second level to focus on finer discriminations within the group. Evaluating the performance of a two-level implementation on the Reuters-22173 testbed of newswire articles shows the most significant improvement for rare classes. © 1999 Kluwer Academic Publishers.
引用
收藏
页码:193 / 216
页数:23
相关论文
共 50 条
  • [1] Exploiting extremely rare features in text categorization
    Schonhofen, Peter
    Benczur, Andras A.
    [J]. MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 759 - 766
  • [2] Fully Automatic Text Categorization by Exploiting WordNet
    Li, Jianqiang
    Zhao, Yu
    Liu, Bo
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 1 - 12
  • [3] Concept Hierarchy-Based Text Database Categorization
    Weiyi Meng
    Wenxian Wang
    Hongyu Sun
    Clement Yu
    [J]. Knowledge and Information Systems, 2002, 4 (2) : 132 - 150
  • [4] Exploiting Ontology Recommendation Using Text Categorization Approach
    Sarwar, Muhammad Azeem
    Ahmed, Mansoor
    Habib, Asad
    Khalid, Muhammad
    Ali, M. Akhtar
    Raza, Mohsin
    Hussain, Shahid
    Ahmed, Ghufran
    [J]. IEEE ACCESS, 2021, 9 : 27304 - 27322
  • [5] Collaborative text categorization via exploiting sparse coefficients
    Lina Yao
    Quan Z. Sheng
    Xianzhi Wang
    Shengrui Wang
    Xue Li
    Sen Wang
    [J]. World Wide Web, 2018, 21 : 373 - 394
  • [6] Exploiting semantic resources for large scale text categorization
    Jian Qiang Li
    Yu Zhao
    Bo Liu
    [J]. Journal of Intelligent Information Systems, 2012, 39 : 763 - 788
  • [7] Exploiting semantic resources for large scale text categorization
    Li, Jian Qiang
    Zhao, Yu
    Liu, Bo
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2012, 39 (03) : 763 - 788
  • [8] Collaborative text categorization via exploiting sparse coefficients
    Yao, Lina
    Sheng, Quan Z.
    Wang, Xianzhi
    Wang, Shengrui
    Li, Xue
    Wang, Sen
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2018, 21 (02): : 373 - 394
  • [9] A Hierarchy-Aware Approach to the Multiaspect Text Categorization Problem
    Zadrozny, Slawomir
    Kacprzyk, Janusz
    Gajewski, Marek
    [J]. RECENT DEVELOPMENTS AND THE NEW DIRECTION IN SOFT-COMPUTING FOUNDATIONS AND APPLICATIONS, 2018, 361 : 49 - 62
  • [10] Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization
    Hsin-Chang Yang
    Chung-Hong Lee
    [J]. Journal of Intelligent Information Systems, 2005, 25 : 47 - 67