Automatic learning features using bootstrapping for text categorization

被引:0
|
作者
Chen, WL [1 ]
Zhu, JB [1 ]
Wu, HL [1 ]
Yao, TS [1 ]
机构
[1] Northeastern Univ, Nat Language Proc Lab, Shenyang, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When text categorization is applied to complex tasks, it is tedious and expensive to hand-label the large amounts of training data necessary for good performance. In this paper, we put forward an approach to text categorization that requires no labeled documents. The proposed approach automatically learns features using bootstrapping. The input consists of a small set of keywords per class and a large amount of easily obtained unlabeled documents. Using these automatically learned features, we develop a naive Bayes classifier. The classifier provides 82.8% F1 while classifying a set of web documents into 10 categories, which performs better than naive Bayes by supervised learning in small number of features cases.
引用
收藏
页码:571 / 579
页数:9
相关论文
共 50 条
  • [1] Automatic Arabic Text Categorization using Bayesian Learning
    Kadhim, Mahmood H.
    Omar, Nazlia
    [J]. 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012), 2012, : 415 - 419
  • [2] Automatic text categorization with learning logic
    Al-Mubaid, H
    Siddiqui, MS
    [J]. COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2003, : 178 - 183
  • [3] Automatic Text Categorization using NTC
    Jo, Taeho
    [J]. NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 26 - 31
  • [4] An Extensive Selection of Features as Combinations for Automatic Text Categorization
    Sohail, Aamir
    Kotha, Chaitanya
    Chavali, Rishanth Kanakadri
    Meghana, Krishna
    Manne, Suneetha
    Fatima, Sameen
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2013, 2014, 247 : 371 - 378
  • [5] Learning effective features for Chinese text categorization
    Luo, DS
    Wang, XH
    Wu, XH
    Chi, HS
    [J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 608 - 613
  • [6] Using kNN model for automatic text categorization
    Guo, GD
    Wang, H
    Bell, D
    Bi, YX
    Greer, K
    [J]. SOFT COMPUTING, 2006, 10 (05) : 423 - 430
  • [7] Automatic Assamese Text Categorization Using WordNet
    Sarmah, Jumi
    Barman, Anup Kumar
    Sarma, Shikhar Kr.
    [J]. 2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 85 - 89
  • [8] Automatic text categorization using neural networks
    Ruiz, ME
    Srinivasan, P
    [J]. ADVANCES IN CLASSIFICATION RESEARCH, VOL 8, 1998, : 59 - 72
  • [9] Using kNN model for automatic text categorization
    Gongde Guo
    Hui Wang
    David Bell
    Yaxin Bi
    Kieran Greer
    [J]. Soft Computing, 2006, 10 : 423 - 430
  • [10] Automatic Multilabel Categorization using Learning to Rank Framework for Complaint Text on Bandung Government
    Fauzan, Ahmad
    Khodra, Masayu Leylia
    [J]. 2014 INTERNATIONAL CONFERENCE OF ADVANCED INFORMATICS: CONCEPT, THEORY AND APPLICATION (ICAICTA), 2014, : 28 - 33