Fully Automatic Text Categorization by Exploiting WordNet

被引:0
|
作者
Li, Jianqiang [1 ]
Zhao, Yu [1 ]
Liu, Bo [1 ]
机构
[1] NEC Labs China, Beijing 100084, Peoples R China
关键词
WordNet; Text Categorization; Semantics;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a Fully Automatic Categorization approach for Text (FACT) by exploiting the semantic features from WordNet and document clustering. In FACT, the training data is constructed automatically by using the knowledge of the category name. With the support of WordNet, it first uses the category name to generate a set of features for the corresponding category. Then, a set of documents is labeled according to such features. To reduce the possible bias originating from the category name and generated features, document clustering is used to refine the quality of initial labeling. The training data are subsequently constructed to train the discriminative classifier. The empirical experiments show that the best performance of FACT can achieve more than 90% of the baseline SVM classifiers in F1 measure, which demonstrates the effectiveness of the proposed approach.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [1] Automatic Assamese Text Categorization Using WordNet
    Sarmah, Jumi
    Barman, Anup Kumar
    Sarma, Shikhar Kr.
    [J]. 2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 85 - 89
  • [2] Using WordNet for text categorization
    Elberrichi, Zakaria
    Rahmoun, Abdelattif
    Bentaalah, Mohamed Amine
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2008, 5 (01) : 16 - 24
  • [3] Exploiting hierarchy in text categorization
    Weigend A.S.
    Wiener E.D.
    Pedersen J.O.
    [J]. Information Retrieval, 1999, 1 (3): : 193 - 216
  • [4] WordNet based cross-language text categorization
    Amine, Bentaallah Mohamed
    Mimoun, Malki
    [J]. 2007 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1 AND 2, 2007, : 848 - +
  • [5] Text Categorization and Information Retrieval Using WordNet Senses
    Rosso, Paolo
    Ferretti, Edgardo
    Jimenez, Daniel
    Vidal, Vicente
    [J]. GWC 2004: SECOND INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS, 2003, : 299 - 304
  • [6] A WordNet-based approach to feature selection in text categorization
    Zhang, K
    Sun, J
    Wang, B
    [J]. INTELLIGENT INFORMATION PROCESSING II, 2005, 163 : 475 - 484
  • [7] An Approach to Automatic Text Summarization using WordNet
    Pal, Alok Ranjan
    Saha, Diganta
    [J]. SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 1169 - 1173
  • [8] Exploiting extremely rare features in text categorization
    Schonhofen, Peter
    Benczur, Andras A.
    [J]. MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 759 - 766
  • [9] Automatic Text Categorization using NTC
    Jo, Taeho
    [J]. NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 26 - 31
  • [10] Automatic text categorization and its application to text retrieval
    Lam, W
    Ruiz, M
    Srinivasan, P
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1999, 11 (06) : 865 - 879