Exploiting semantic resources for large scale text categorization

被引:0
|
作者
Jian Qiang Li
Yu Zhao
Bo Liu
机构
[1] NEC Laboratories China,
关键词
Web-scale text categorization; Semantic analysis; Semantic information processing;
D O I
暂无
中图分类号
学科分类号
摘要
The traditional supervised classifier for Text Categorization (TC) is learned from a set of hand-labeled documents. However, the task of manual data labeling is labor intensive and time consuming, especially for a complex TC task with hundreds or thousands of categories. To address this issue, many semi-supervised methods have been reported to use both labeled and unlabeled documents for TC. But they still need a small set of labeled data for each category. In this paper, we propose a Fully Automatic Categorization approach for Text (FACT), where no manual labeling efforts are required. In FACT, the lexical databases serve as semantic resources for category name understanding. It combines the semantic analysis of category names and statistic analysis of the unlabeled document set for fully automatic training data construction. With the support of lexical databases, we first use the category name to generate a set of features as a representative profile for the corresponding category. Then, a set of documents is labeled according to the representative profile. To reduce the possible bias originating from the category name and the representative profile, document clustering is used to refine the quality of initial labeling. The training data are subsequently constructed to train the discriminative classifier. The empirical experiments show that one variant of our FACT approach outperforms the state-of-the-art unsupervised TC approach significantly. It can achieve more than 90% of F1 performance of the baseline SVM methods, which demonstrates the effectiveness of the proposed approaches.
引用
收藏
页码:763 / 788
页数:25
相关论文
共 50 条
  • [1] Exploiting semantic resources for large scale text categorization
    Li, Jian Qiang
    Zhao, Yu
    Liu, Bo
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2012, 39 (03) : 763 - 788
  • [2] A Fully Semantic Approach to Large Scale Text Categorization
    Dessi, Nicoletta
    Dessi, Stefania
    Pes, Barbara
    INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 149 - 157
  • [3] SVM-based semantic text categorization for large scale web information organization
    Fu, P
    Zhang, DY
    Ma, ZF
    Dong, H
    ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 1, PROCEEDINGS, 2005, 3496 : 931 - 936
  • [4] Kernel-based semantic text categorization for large scale web information organization
    Fu, XH
    Ma, ZF
    Feng, BQ
    GRID AND COOPERATIVE COMPUTING GCC 2004, PROCEEDINGS, 2004, 3251 : 389 - 396
  • [5] Exploiting hierarchy in text categorization
    Weigend A.S.
    Wiener E.D.
    Pedersen J.O.
    Information Retrieval, 1999, 1 (3): : 193 - 216
  • [6] Semi-Supervised Learning in Large Scale Text Categorization
    许泽文
    李建强
    刘博
    毕敬
    李蓉
    毛睿
    Journal of Shanghai Jiaotong University(Science), 2017, 22 (03) : 291 - 302
  • [7] Semi-supervised learning in large scale text categorization
    Xu Z.
    Li J.
    Liu B.
    Bi J.
    Li R.
    Mao R.
    Journal of Shanghai Jiaotong University (Science), 2017, 22 (3) : 291 - 302
  • [8] Large-scale Bayesian logistic regression for text categorization
    Genkin, Alexander
    Lewis, David D.
    Madigan, David
    TECHNOMETRICS, 2007, 49 (03) : 291 - 304
  • [9] Exploiting extremely rare features in text categorization
    Schonhofen, Peter
    Benczur, Andras A.
    MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 759 - 766
  • [10] Fully Automatic Text Categorization by Exploiting WordNet
    Li, Jianqiang
    Zhao, Yu
    Liu, Bo
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 1 - 12