Text classification by labeling words

被引：0

作者：

Liu, B ^{[1
]}

Li, XL ^{[1
]}

Lee, WS ^{[1
]}

Yu, PS ^{[1
]}

机构：

[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60680 USA

来源：

PROCEEDING OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE SIXTEENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2004年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Traditionally, text classifiers are built from labeled training examples. Labeling is usually done manually by human experts (or the users), which is a labor intensive and time consuming process. In the past few years, researchers investigated various forms of semi-supervised learning to reduce the burden of manual labeling. In this paper, we propose a different approach. Instead of labeling a set of documents, the proposed method labels a set of representative words for each class. It then uses these words to extract a set of documents for each class from a set of unlabeled documents to form the initial training set. The EM algorithm is then applied to build the classifier. The key issue of the approach is how to obtain a set of representative words for each class. One way is to ask the user to provide them, which is difficult because the user usually can only give a few words (which are insufficient for accurate teaming). We propose a method to solve the problem. It combines clustering and feature selection. The technique can effectively rank the words in the unlabeled set according to their importance. The user then selects/labels some words from the ranked list for each class. This process requires less effort than providing words with no help or manual labeling of documents. Our results show that the new method is highly effective and promising.

引用

页码：425 / 430

页数：6

共 50 条

[1] Words in Pairs Neural Networks for Text Classification
Wu Yujia
Li Jing
Song Chengfang
Chang Jun
CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (03) : 491 - 500
[2] Automatic text classification using words networks
Pablo Cardenas, Juan
Olivares, Gaston
Alfaro, Rodrigo
REVISTA SIGNOS, 2014, 47 (86): : 346 - 364
[3] Words in Pairs Neural Networks for Text Classification
WU Yujia
LI Jing
SONG Chengfang
CHANG Jun
Chinese Journal of Electronics, 2020, 29 (03) : 491 - 500
[4] Joint Embedding of Words and Labels for Text Classification
Wang, Guoyin
Li, Chunyuan
Wang, Wenlin
Zhang, Yizhe
Shen, Dinghan
Zhang, Xinyuan
Henao, Ricardo
Carin, Lawrence
PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2321 - 2331
[5] Text Mining in Hotel Reviews: Impact of Words Restriction in Text Classification
Campos, Diogo
Silva, Rodrigo Rocha
Bernardino, Jorge
KDIR: PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL 1: KDIR, 2019, : 442 - 449
[6] Automatic labeling system of words in text materials for college English tests
Gong, Hongqi
Li, Lihua
Zhao, Hongyu
2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 209 - 213
[7] OPTIMIZING FEATURES BY CORRELATING FOR CONCEPT LABELING IN TEXT CLASSIFICATION
Ramana, Venkata A.
Naidu, M. M.
SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 561 - 567
[8] Combining Words and Concepts for Automatic Arabic Text Classification
Alahmadi, Alaa
Joorabchi, Arash
Mahdi, Abdulhussain E.
ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 105 - 119
[9] Short Text Classification Using Very Few Words
Sun, Aixin
SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1145 - 1146
[10] Words Matter: Scene Text for Image Classification and Retrieval
Karaoglu, Sezer
Tao, Ran
Gevers, Theo
Smeulders, Arnold W. M.
IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (05) : 1063 - 1076

← 1 2 3 4 5 →