Exploiting the value of class labels on high-dimensional feature spaces: topic models for semi-supervised document classification

被引:6
|
作者
Soleimani, Hossein [1 ]
Miller, David J. [1 ]
机构
[1] Penn State Univ, Sch Elect Engn & Comp Sci, University Pk, PA 16802 USA
关键词
Semi-supervised learning; Topic mdels; Document classification;
D O I
10.1007/s10044-017-0629-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a class-based mixture of topic models for classifying documents using both labeled and unlabeled examples (i.e., in a semi-supervised fashion). Most topic models incorporate documents' class labels by generating them after generating the words. In these models, the training class labels have small effect on the estimated topics, as they are effectively treated as just another word, amongst a huge set of word features. In this paper, we propose to increase the influence of class labels on topic models by generating the words in each document conditioned on the class label. We show that our specific generative process improves classification performance with small loss in test set log-likelihood. Within our framework, we provide a principled mechanism to control the contributions of the class labels and the word space to the likelihood function. Experiments show our approach achieves better classification accuracy compared to some standard semi-supervised and supervised topic models.
引用
收藏
页码:299 / 309
页数:11
相关论文
共 50 条
  • [1] Exploiting the value of class labels on high-dimensional feature spaces: topic models for semi-supervised document classification
    Hossein Soleimani
    David J. Miller
    [J]. Pattern Analysis and Applications, 2019, 22 : 299 - 309
  • [2] Exploiting the Value of Class Labels in Topic Models for Semi-Supervised Document Classification
    Soleimani, Hossein
    Miller, David J.
    [J]. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 4025 - 4031
  • [3] Semi-supervised Multi-Label Topic Models for Document Classification and Sentence Labeling
    Soleimani, Hossein
    Miller, David J.
    [J]. CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 105 - 114
  • [4] Neural labeled LDA: a topic model for semi-supervised document classification
    Wang, Wei
    Guo, Bing
    Shen, Yan
    Yang, Han
    Chen, Yaosen
    Suo, Xinhua
    [J]. SOFT COMPUTING, 2021, 25 (23) : 14561 - 14571
  • [5] Neural labeled LDA: a topic model for semi-supervised document classification
    Wei Wang
    Bing Guo
    Yan Shen
    Han Yang
    Yaosen Chen
    Xinhua Suo
    [J]. Soft Computing, 2021, 25 : 14561 - 14571
  • [6] Online Semi-Supervised Classification on Multilabel Evolving High-Dimensional Text Streams
    Kumar, Jay
    Shao, Junming
    Kumar, Rajesh
    Din, Salah Ud
    Mawuli, Cobbinah B.
    Yang, Qinli
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (10): : 5983 - 5995
  • [7] Fault Classification in High-Dimensional Complex Processes Using Semi-Supervised Deep Convolutional Generative Models
    Ko, Taeyoung
    Kim, Heeyoung
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (04) : 2868 - 2877
  • [8] Robust safe semi-supervised learning framework for high-dimensional data classification
    Ma, Jun
    Zhu, Xiaolong
    [J]. AIMS MATHEMATICS, 2024, 9 (09): : 25705 - 25731
  • [9] Semi-supervised Distance Metric Learning in High-Dimensional Spaces by Using Equivalence Constraints
    Cevikalp, Hakan
    [J]. COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS: THEORY AND APPLICATIONS, 2010, 68 : 242 - 254
  • [10] Simultaneous feature selection and classification via semi-supervised models
    Yang, Liming
    Wang, Laisheng
    [J]. ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 1, PROCEEDINGS, 2007, : 646 - +