Automatic learning features using bootstrapping for text categorization

被引:0
|
作者
Chen, WL [1 ]
Zhu, JB [1 ]
Wu, HL [1 ]
Yao, TS [1 ]
机构
[1] Northeastern Univ, Nat Language Proc Lab, Shenyang, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When text categorization is applied to complex tasks, it is tedious and expensive to hand-label the large amounts of training data necessary for good performance. In this paper, we put forward an approach to text categorization that requires no labeled documents. The proposed approach automatically learns features using bootstrapping. The input consists of a small set of keywords per class and a large amount of easily obtained unlabeled documents. Using these automatically learned features, we develop a naive Bayes classifier. The classifier provides 82.8% F1 while classifying a set of web documents into 10 categories, which performs better than naive Bayes by supervised learning in small number of features cases.
引用
下载
收藏
页码:571 / 579
页数:9
相关论文
共 50 条
  • [21] Automatic text categorization and its application to text retrieval
    Lam, W
    Ruiz, M
    Srinivasan, P
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1999, 11 (06) : 865 - 879
  • [22] Automatic Text Categorization Marathi documents
    Patil, Javdeep Jalindar
    Bogiri, Nagaraju
    2015 INTERNATIONAL CONFERENCE ON ENERGY SYSTEMS AND APPLICATIONS, 2015, : 689 - 694
  • [23] A Communication Perspective on Automatic Text Categorization
    Capdevila, Marta
    Marquez Florez, Oscar W.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (07) : 1027 - 1041
  • [24] Automatic text categorization:: Case study
    Corrêa, RF
    Ludermir, TB
    VII BRAZILIAN SYMPOSIUM ON NEURAL NETWORKS, PROCEEDINGS, 2002, : 150 - 150
  • [25] Automatic text categorization of news articles
    Amasyali, MF
    Yildirim, T
    PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 224 - 226
  • [26] Text Classifiers for Automatic Articles Categorization
    Westa, Mateusz
    Szymanski, Julian
    Krawczyk, Henryk
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 196 - 204
  • [27] Arabic Text Categorization using Machine Learning Approaches
    Alshammari, Riyad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (03) : 226 - 230
  • [28] Text categorization using the learning vector quantization algorithm
    Martín-Valdivia, MT
    García-Vega, M
    García-Cumbreras, MA
    López, LAU
    INTELLIGENT INFORMATION PROCESSING AND WEB MINING, 2004, : 341 - 348
  • [29] Usage of Distinctive Classifiers for Text Categorization Using Distributional Features
    Mubeen, Sayyada
    Qaseem, Mohammad S.
    Govardhan, A.
    2011 ANNUAL IEEE INDIA CONFERENCE (INDICON-2011): ENGINEERING SUSTAINABLE SOLUTIONS, 2011,
  • [30] Automatic generation of text categorization rules in a hybrid method based on machine learning
    Lana-Serrano, Sara
    Villena-Roman, Julio
    Collada-Perez, Sonia
    Carlos Gonzalez-Cristobal, Jose
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (47): : 231 - 237