Exploiting semantic resources for large scale text categorization

被引:0
|
作者
Jian Qiang Li
Yu Zhao
Bo Liu
机构
[1] NEC Laboratories China,
关键词
Web-scale text categorization; Semantic analysis; Semantic information processing;
D O I
暂无
中图分类号
学科分类号
摘要
The traditional supervised classifier for Text Categorization (TC) is learned from a set of hand-labeled documents. However, the task of manual data labeling is labor intensive and time consuming, especially for a complex TC task with hundreds or thousands of categories. To address this issue, many semi-supervised methods have been reported to use both labeled and unlabeled documents for TC. But they still need a small set of labeled data for each category. In this paper, we propose a Fully Automatic Categorization approach for Text (FACT), where no manual labeling efforts are required. In FACT, the lexical databases serve as semantic resources for category name understanding. It combines the semantic analysis of category names and statistic analysis of the unlabeled document set for fully automatic training data construction. With the support of lexical databases, we first use the category name to generate a set of features as a representative profile for the corresponding category. Then, a set of documents is labeled according to the representative profile. To reduce the possible bias originating from the category name and the representative profile, document clustering is used to refine the quality of initial labeling. The training data are subsequently constructed to train the discriminative classifier. The empirical experiments show that one variant of our FACT approach outperforms the state-of-the-art unsupervised TC approach significantly. It can achieve more than 90% of F1 performance of the baseline SVM methods, which demonstrates the effectiveness of the proposed approaches.
引用
收藏
页码:763 / 788
页数:25
相关论文
共 50 条
  • [31] Web text categorization based on latent semantic analysis
    Wang Jianfeng
    Yuan Jinsha
    ICCSE'2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION: ADVANCED COMPUTER TECHNOLOGY, NEW EDUCATION, 2006, : 826 - 828
  • [32] Does Semantic Information Help in the Text Categorization Task?
    Ferretti, Edgardo
    Errecalde, Marcelo
    Rosso, Paolo
    JOURNAL OF INTELLIGENT SYSTEMS, 2008, 17 (1-3) : 91 - 106
  • [33] Exploiting Semantic Role Resources for Preposition Disambiguation
    O'Hara, Tom
    Wiebe, Janyce
    COMPUTATIONAL LINGUISTICS, 2009, 35 (02) : 151 - 184
  • [34] The Research on Automatic Construction Techniques of Large-scale Corpus for Chinese Text Categorization
    Hu, Yan
    Wu, Wei
    Miao, Miao
    IEEC 2009: FIRST INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE, PROCEEDINGS, 2009, : 640 - 645
  • [35] Towards an intelligent text categorization for web resources: An implementation
    Zadrozny, S
    Lawcewicz, K
    Kacprzyk, J
    INTELLIGENT SYSTEMS FOR INFORMATION PROCESSING: FROM REPRESENTATION TO APPLICATIONS, 2003, : 153 - 164
  • [36] Improving Semantic Scene Categorization by Exploiting Audio-Visual Features
    Zhu, Songhao
    Yan, Junchi
    Liu, Yuncai
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS (ICIG 2009), 2009, : 435 - 440
  • [37] Semantic Clustering and Convolutional Neural Network for Short Text Categorization
    Wang, Peng
    Xu, Jiaming
    Xu, Bo
    Liu, Cheng-Lin
    Zhang, Heng
    Wang, Fangyuan
    Hao, Hongwei
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 352 - 357
  • [38] Robust discriminant analysis of latent semantic feature for text categorization
    Hu, Jiani
    Deng, Weihong
    Guo, Jun
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 400 - 409
  • [39] A semantic case-based reasoning framework for text categorization
    Ceausu, Valentina
    Despres, Sylvie
    SEMANTIC WEB, PROCEEDINGS, 2007, 4825 : 736 - +
  • [40] Non-negative Sparse Semantic Coding for Text Categorization
    Zheng, Wenbin
    Qian, Yuntao
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 409 - 412