A comparative study on the use of labeled and unlabeled data for large margin classifiers

被引:0
|
作者
Takamura, H [1 ]
Okumura, M [1 ]
机构
[1] Tokyo Inst Technol, Precis & Intelligence Lab, Midori Ku, Yokohama, Kanagawa 2268503, Japan
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose to use both labeled and unlabeled data with the Expectation-Maximization (EM) algorithm in order to estimate the generative model and use this model to construct a Fisher kernel. The Naive Bayes generative probability is used to model a document. Through the experiments of text categorization, we empirically show that, (a) the Fisher kernel with labeled and unlabeled data outperforms Naive Bayes classifiers with EM and other methods for a sufficient amount of labeled data, (b) the value of additional unlabeled data diminishes when the labeled data size is large enough for estimating a reliable model, (c) the use of categories as latent variables is effective, and (d) larger unlabeled training datasets yield better results.
引用
收藏
页码:456 / 465
页数:10
相关论文
共 50 条
  • [1] A comparative study of the use of large margin classifiers on seismic data
    Drosou, Krystallenia
    Artemiou, Andreas
    Koukouvinos, Christos
    [J]. JOURNAL OF APPLIED STATISTICS, 2015, 42 (01) : 180 - 201
  • [2] A Gaussian Latent Variable Model for Large Margin Classification of Labeled and Unlabeled Data
    Kim, Do-kyum
    Der, Matthew
    Saul, Lawrence K.
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 33, 2014, 33 : 484 - 492
  • [3] Learning Balanced Bayesian Classifiers From Labeled and Unlabeled Data
    Guo, Lu
    Wang, Limin
    Li, Qilong
    Li, Kuo
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (04) : 330 - 342
  • [4] SEQUENTIAL MAXIMUM MARGIN CLASSIFIERS FOR PARTIALLY LABELED DATA
    Hou, Elizabeth
    Hero, Alfred O.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2826 - 2830
  • [5] Large Margin Distribution Learning with Cost Interval and Unlabeled Data
    Zhou, Yu-Hang
    Zhou, Zhi-Hua
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (07) : 1749 - 1763
  • [6] Efficient use of unlabeled data for protein sequence classification: a comparative study
    Kuksa, Pavel
    Huang, Pai-Hsi
    Pavlovic, Vladimir
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [7] Efficient use of unlabeled data for protein sequence classification: a comparative study
    Pavel Kuksa
    Pai-Hsi Huang
    Vladimir Pavlovic
    [J]. BMC Bioinformatics, 10
  • [8] Combining supervised classifiers with unlabeled data
    Xue-yan Liu
    Xue-ying Zhang
    Feng-lian Li
    Li-xia Huang
    [J]. Journal of Central South University, 2016, 23 : 1176 - 1182
  • [9] Combining supervised classifiers with unlabeled data
    刘雪艳
    张雪英
    李凤莲
    黄丽霞
    [J]. Journal of Central South University, 2016, 23 (05) : 1176 - 1182
  • [10] Combining supervised classifiers with unlabeled data
    Liu Xue-yan
    Zhang Xue-ying
    Li Feng-lian
    Huang Li-xia
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2016, 23 (05) : 1176 - 1182