Text classification by labeling words

被引:0
|
作者
Liu, B [1 ]
Li, XL [1 ]
Lee, WS [1 ]
Yu, PS [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60680 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditionally, text classifiers are built from labeled training examples. Labeling is usually done manually by human experts (or the users), which is a labor intensive and time consuming process. In the past few years, researchers investigated various forms of semi-supervised learning to reduce the burden of manual labeling. In this paper, we propose a different approach. Instead of labeling a set of documents, the proposed method labels a set of representative words for each class. It then uses these words to extract a set of documents for each class from a set of unlabeled documents to form the initial training set. The EM algorithm is then applied to build the classifier. The key issue of the approach is how to obtain a set of representative words for each class. One way is to ask the user to provide them, which is difficult because the user usually can only give a few words (which are insufficient for accurate teaming). We propose a method to solve the problem. It combines clustering and feature selection. The technique can effectively rank the words in the unlabeled set according to their importance. The user then selects/labels some words from the ranked list for each class. This process requires less effort than providing words with no help or manual labeling of documents. Our results show that the new method is highly effective and promising.
引用
收藏
页码:425 / 430
页数:6
相关论文
共 50 条
  • [1] Words in Pairs Neural Networks for Text Classification
    Wu Yujia
    Li Jing
    Song Chengfang
    Chang Jun
    CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (03) : 491 - 500
  • [2] Automatic text classification using words networks
    Pablo Cardenas, Juan
    Olivares, Gaston
    Alfaro, Rodrigo
    REVISTA SIGNOS, 2014, 47 (86): : 346 - 364
  • [3] Words in Pairs Neural Networks for Text Classification
    WU Yujia
    LI Jing
    SONG Chengfang
    CHANG Jun
    Chinese Journal of Electronics, 2020, 29 (03) : 491 - 500
  • [4] Joint Embedding of Words and Labels for Text Classification
    Wang, Guoyin
    Li, Chunyuan
    Wang, Wenlin
    Zhang, Yizhe
    Shen, Dinghan
    Zhang, Xinyuan
    Henao, Ricardo
    Carin, Lawrence
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2321 - 2331
  • [5] Text Mining in Hotel Reviews: Impact of Words Restriction in Text Classification
    Campos, Diogo
    Silva, Rodrigo Rocha
    Bernardino, Jorge
    KDIR: PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL 1: KDIR, 2019, : 442 - 449
  • [6] Automatic labeling system of words in text materials for college English tests
    Gong, Hongqi
    Li, Lihua
    Zhao, Hongyu
    2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 209 - 213
  • [7] OPTIMIZING FEATURES BY CORRELATING FOR CONCEPT LABELING IN TEXT CLASSIFICATION
    Ramana, Venkata A.
    Naidu, M. M.
    SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 561 - 567
  • [8] Combining Words and Concepts for Automatic Arabic Text Classification
    Alahmadi, Alaa
    Joorabchi, Arash
    Mahdi, Abdulhussain E.
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 105 - 119
  • [9] Short Text Classification Using Very Few Words
    Sun, Aixin
    SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1145 - 1146
  • [10] Words Matter: Scene Text for Image Classification and Retrieval
    Karaoglu, Sezer
    Tao, Ran
    Gevers, Theo
    Smeulders, Arnold W. M.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (05) : 1063 - 1076