SEMI-SUPERVISED LEARNING OF LANGUAGE MODEL USING UNSUPERVISED TOPIC MODEL

被引:0
|
作者
Bai, Shuanhu [1 ]
Huang, Chien-Lin [1 ]
Ma, Bin [1 ]
Li, Haizhou [1 ]
机构
[1] Inst Infocomm Res, Singapore, Singapore
关键词
semi-supervised learning; language model; topic model;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a semi-supervised learning (SSL) method for building domain-specific language models (LMs) from general-domain data using probabilistic latent semantic analysis (PLSA). The proposed technique first performs topic decomposition (TD) on the combined dataset of domain-specific and general-domain data. Then it derives latent topic distribution of the interested domain, and derives domain-specific word n-gram counts with a PLSA style mixture model. Finally, it uses traditional n-gram modeling to construct domain-specific LMs from the domain-specific word n-gram counts. Experimental results show that this technique outperforms both states-of-the-art relative entropy text selection and traditional supervised training methods.
引用
收藏
页码:5382 / 5385
页数:4
相关论文
共 50 条
  • [1] A Hybrid Semi-supervised Topic Model
    Zhang, Yanning
    Wei, Wei
    [J]. INTELLIGENT SCIENCE AND INTELLIGENT DATA ENGINEERING, ISCIDE 2011, 2012, 7202 : 309 - 317
  • [2] Name Disambiguation Using Semi-supervised Topic Model
    Fu, JinLan
    Qiu, Jie
    Wang, Jing
    Li, Li
    [J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2015, PT III, 2015, 9227 : 471 - 480
  • [3] A jointly distributed semi-supervised topic model
    Zhang, Yanning
    Wei, Wei
    [J]. NEUROCOMPUTING, 2014, 134 : 38 - 45
  • [4] Semi-supervised and unsupervised discriminative language model training for automatic speech recognition
    Dikici, Erinc
    Saraclar, Murat
    [J]. SPEECH COMMUNICATION, 2016, 83 : 54 - 63
  • [5] Acoustic Model Bootstrapping Using Semi-Supervised Learning
    Chen, Langzhou
    Leutnant, Volker
    [J]. INTERSPEECH 2019, 2019, : 3198 - 3202
  • [6] Disfluency Correction using Unsupervised and Semi-supervised Learning
    Saini, Nikhil
    Trivedi, Drumil
    Khare, Shreya
    Dhamecha, Tejas, I
    Jyothi, Preethi
    Bharadwaj, Samarth
    Bhattacharyya, Pushpak
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3421 - 3427
  • [7] A Discriminative Model for Semi-Supervised Learning
    Balcan, Maria-Florina
    Blum, Avrim
    [J]. JOURNAL OF THE ACM, 2010, 57 (03)
  • [8] Semi-supervised Learning Using a Constrained Labeling LDA Model
    Guzman, Rel
    Ochoa-Luna, Eduardo
    [J]. PROCEEDINGS OF THE 2016 IEEE ANDESCON, 2016,
  • [9] A Semi-Supervised Topic Model Incorporating Sentiment and Dynamic Characteristic
    Zhang, Lanshan
    Ding, Xi
    Tian, Ye
    Gong, Xiangyang
    Wang, Wendong
    [J]. CHINA COMMUNICATIONS, 2016, 13 (12) : 162 - 175
  • [10] Abnormal event detection with semi-supervised sparse topic model
    Wang, Jun
    Xia, Limin
    Hu, Xiangjie
    Xiao, Yongliang
    [J]. NEURAL COMPUTING & APPLICATIONS, 2019, 31 (05): : 1607 - 1617