Exploiting semantic resources for large scale text categorization

被引:0
|
作者
Jian Qiang Li
Yu Zhao
Bo Liu
机构
[1] NEC Laboratories China,
关键词
Web-scale text categorization; Semantic analysis; Semantic information processing;
D O I
暂无
中图分类号
学科分类号
摘要
The traditional supervised classifier for Text Categorization (TC) is learned from a set of hand-labeled documents. However, the task of manual data labeling is labor intensive and time consuming, especially for a complex TC task with hundreds or thousands of categories. To address this issue, many semi-supervised methods have been reported to use both labeled and unlabeled documents for TC. But they still need a small set of labeled data for each category. In this paper, we propose a Fully Automatic Categorization approach for Text (FACT), where no manual labeling efforts are required. In FACT, the lexical databases serve as semantic resources for category name understanding. It combines the semantic analysis of category names and statistic analysis of the unlabeled document set for fully automatic training data construction. With the support of lexical databases, we first use the category name to generate a set of features as a representative profile for the corresponding category. Then, a set of documents is labeled according to the representative profile. To reduce the possible bias originating from the category name and the representative profile, document clustering is used to refine the quality of initial labeling. The training data are subsequently constructed to train the discriminative classifier. The empirical experiments show that one variant of our FACT approach outperforms the state-of-the-art unsupervised TC approach significantly. It can achieve more than 90% of F1 performance of the baseline SVM methods, which demonstrates the effectiveness of the proposed approaches.
引用
收藏
页码:763 / 788
页数:25
相关论文
共 50 条
  • [41] Exploiting poly-lingual documents for improving text categorization effectiveness
    Wei, Chih-Ping
    Yang, Chin-Sheng
    Lee, Ching-Hsien
    Shi, Huihua
    Yang, Christopher C.
    DECISION SUPPORT SYSTEMS, 2014, 57 : 64 - 76
  • [42] Latent semantic analysis for text categorization using neural network
    Yu, Bo
    Xu, Zong-ben
    Li, Cheng-hua
    KNOWLEDGE-BASED SYSTEMS, 2008, 21 (08) : 900 - 904
  • [43] Measurement of Turkish Word Semantic Similarity and Text Categorization Application
    Amasyah, M. Fatih
    Beken, Aytunc
    2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 1 - 4
  • [44] Support Vector Machines based on a semantic kernel for text categorization
    Siolas, G
    d'Alché-Buc, F
    IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL V, 2000, : 205 - 209
  • [45] Large-Scale Hierarchical Text classification Based on Path Semantic Information
    Gao, Feng
    Wu, Chengrong
    Guo, Naiwang
    Zhao, Danfeng
    2009 INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING, PROCEEDINGS, 2009, : 223 - 227
  • [46] Learning Semantic Similarity for Multi-label Text Categorization
    Li, Li
    Wang, Mengxiang
    Zhang, Longkai
    Wang, Houfeng
    CHINESE LEXICAL SEMANTICS, 2014, 8922 : 260 - 269
  • [47] Building a large-scale testing dataset for conceptual semantic annotation of text
    Wei, Xiao
    Zeng, Daniel Dajun
    Luo, Xiangfeng
    Wu, Wei
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2018, 16 (01) : 63 - 72
  • [48] Using category-based semantic field for text categorization
    Wang, QA
    Guan, Y
    Wang, XL
    Xu, ZM
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 3781 - 3786
  • [49] A Comparative Analysis of Strategies for Semantic Short-Text Categorization
    Rosas, Maria V.
    Errecalde, Marcelo L.
    Rosso, Paolo
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (44): : 11 - 18
  • [50] Large⁃scale semantic text overlapping region retrieval based on deep learning
    Dong L.-L.
    Yang D.
    Zhang X.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2021, 51 (05): : 1817 - 1822