Exploiting the Value of Class Labels in Topic Models for Semi-Supervised Document Classification

被引:0
|
作者
Soleimani, Hossein [1 ]
Miller, David J. [1 ]
机构
[1] Penn State Univ, Sch Elect Engn & Comp Sci, University Pk, PA 16803 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a mixture of class-conditioned topic models for classifying text documents using both labeled and unlabeled training documents in a semi-supervised fashion. Most topic models incorporate documents' class labels by generating them after generating the word space. In these models, the training class labels have relatively small effect on the estimated topics, as the likelihood function is mostly dominated by the word space, whose size dwarfs a single class label per document. In this paper, we propose to increase the influence of class labels on model parameters by generating the word space in each document conditioned on the class label. We show that our specific generative process improves classification performance while maintaining the ability of the model to discover topics from the word space. Within our framework, we also provide a principled mechanism to control the contribution of the class labels and the word space to the likelihood function. Experimental results show that our approach achieves better classification performance compared to some standard semi-supervised and supervised topic models.
引用
收藏
页码:4025 / 4031
页数:7
相关论文
共 50 条
  • [1] Exploiting the value of class labels on high-dimensional feature spaces: topic models for semi-supervised document classification
    Soleimani, Hossein
    Miller, David J.
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2019, 22 (02) : 299 - 309
  • [2] Exploiting the value of class labels on high-dimensional feature spaces: topic models for semi-supervised document classification
    Hossein Soleimani
    David J. Miller
    [J]. Pattern Analysis and Applications, 2019, 22 : 299 - 309
  • [3] Semi-supervised Multi-Label Topic Models for Document Classification and Sentence Labeling
    Soleimani, Hossein
    Miller, David J.
    [J]. CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 105 - 114
  • [4] Neural labeled LDA: a topic model for semi-supervised document classification
    Wang, Wei
    Guo, Bing
    Shen, Yan
    Yang, Han
    Chen, Yaosen
    Suo, Xinhua
    [J]. SOFT COMPUTING, 2021, 25 (23) : 14561 - 14571
  • [5] Neural labeled LDA: a topic model for semi-supervised document classification
    Wei Wang
    Bing Guo
    Yan Shen
    Han Yang
    Yaosen Chen
    Xinhua Suo
    [J]. Soft Computing, 2021, 25 : 14561 - 14571
  • [6] Semi-Supervised Learning on an Augmented Graph with Class Labels
    Li, Nan
    Latecki, Longin Jan
    [J]. ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 1571 - 1572
  • [7] Semi-supervised topic classification for low resource languages
    Liu, Daben
    McVeety, Sam
    Prasad, Rohit
    Natarajan, Prem
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5093 - 5096
  • [8] EXPLOITING MULTIVIEW PROPERTIES IN SEMI-SUPERVISED VIDEO CLASSIFICATION
    Karimian, Mahmood
    Tavassolipour, Mostafa
    Kasaei, Shohreh
    [J]. 2012 SIXTH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2012, : 837 - 842
  • [9] Semi-Supervised Prediction-Constrained Topic Models
    Hughes, Michael C.
    Hope, Gabriel
    Weiner, Leah
    McCoy, Thomas H., Jr.
    Perlis, Roy H.
    Sudderth, Erik
    Doshi-Velez, Finale
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [10] Semi-supervised document classification with a mislabeling error model
    Krithara, Anastasia
    Amini, Massih R.
    Renders, Jean-Michel
    Goutte, Cyril
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 370 - +