Topic Labeled Text Classification: A Weakly Supervised Approach

被引:17
|
作者
Hingmire, Swapnil [1 ,2 ]
Chakraborti, Sutanu [2 ]
机构
[1] Tata Res Dev & Design Ctr, Syst Res Lab, Pune, Maharashtra, India
[2] Indian Inst Technol Madras, Dept Comp Sci & Engn, Madras, Tamil Nadu, India
关键词
text classification; topic modelling; weakly supervised; semi supervised;
D O I
10.1145/2600428.2609565
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Supervised text classifiers require extensive human expertise and labeling efforts. In this paper, we propose a weakly supervised text classification algorithm based on the labeling of Latent Dirichlet Allocation (LDA) topics. Our algorithm is based on the generative property of LDA. In our algorithm, we ask an annotator to assign one or more class labels to each topic, based on its most probable words. We classify a document based on its posterior topic proportions and the class labels of the topics. We also enhance our approach by incorporating domain knowledge in the form of labeled words. We evaluate our approach on four real world text classification datasets. The results show that our approach is more accurate in comparison to semi-supervised techniques from previous work. A central contribution of this work is an approach that delivers effectiveness comparable to the state-of-the-art supervised techniques in hard-to-classify domains, with very low overheads in terms of manual knowledge engineering.
引用
收藏
页码:385 / 394
页数:10
相关论文
共 50 条
  • [1] Weakly supervised text classification framework for noisy-labeled imbalanced
    Zhang, Wenxin
    Zhou, Yaya
    Liu, Shuhui
    Zhang, Yupei
    Shang, Xuequn
    [J]. NEUROCOMPUTING, 2024, 610
  • [2] Weakly-Supervised Hierarchical Text Classification
    Meng, Yu
    Shen, Jiaming
    Zhang, Chao
    Han, Jiawei
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6826 - 6833
  • [3] Sprinkling Topics for Weakly Supervised Text Classification
    Hingmire, Swapnil
    Chakraborti, Sutanu
    [J]. PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, : 55 - 60
  • [4] Weakly-Supervised Neural Text Classification
    Meng, Yu
    Shen, Jiaming
    Zhang, Chao
    Han, Jiawei
    [J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 983 - 992
  • [5] Knowledge Supervised Text Classification with No Labeled Documents
    Zhang, Congle
    Xue, Gui-Rong
    Yu, Yong
    [J]. PRICAI 2008: TRENDS IN ARTIFICIAL INTELLIGENCE, 2008, 5351 : 509 - +
  • [6] A weakly supervised textual entailment approach to zero-shot text classification
    Pamies, Marc
    Llop, Joan
    Multari, Francesco
    Duran-Silva, Nicolau
    Parra-Rojas, Cesar
    Gonzalez-Agirre, Aitor
    Massucci, Francesco Alessandro
    Villegas, Marta
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 286 - 296
  • [7] Weakly Supervised Joint Sentiment-Topic Detection from Text
    Lin, Chenghua
    He, Yulan
    Everson, Richard
    Rueger, Stefan
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (06) : 1134 - 1145
  • [8] Twin labeled LDA: a supervised topic model for document classification
    Wei Wang
    Bing Guo
    Yan Shen
    Han Yang
    Yaosen Chen
    Xinhua Suo
    [J]. Applied Intelligence, 2020, 50 : 4602 - 4615
  • [9] Twin labeled LDA: a supervised topic model for document classification
    Wang, Wei
    Guo, Bing
    Shen, Yan
    Yang, Han
    Chen, Yaosen
    Suo, Xinhua
    [J]. APPLIED INTELLIGENCE, 2020, 50 (12) : 4602 - 4615
  • [10] Weakly Supervised Feature Compression Based Topic Model for Sentiment Classification
    Hu, Yan
    Xu, Xiaofei
    Li, Li
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2017): 10TH INTERNATIONAL CONFERENCE, KSEM 2017, MELBOURNE, VIC, AUSTRALIA, AUGUST 19-20, 2017, PROCEEDINGS, 2017, 10412 : 29 - 41