Language Model Adaptation Using Latent Dirichlet Allocation and an Efficient Topic Inference Algorithm

被引:0
|
作者
Heidel, Aaron [1 ]
Chang, Hung-an
Lee, Lin-shan [1 ]
机构
[1] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 10764, Taiwan
关键词
language model; unsupervised adaptation; topic modeling; speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an effort to perform topic mixture-based language model adaptation using latent Dirichlet allocation (LDA). We use probabilistic latent semantic analysis (PLSA) to automatically cluster a heterogeneous training corpus, and train an LDA model using the resultant topic-document assignments. Using this LDA model, we then construct topic-specific corpora at the utterance level for interpolation with a background language model during language model adaptation. We also present a novel iterative algorithm for LDA topic inference. Very encouraging results were obtained in preliminary experiments with broadcast news in Mandarin Chinese.
引用
收藏
页码:1145 / +
页数:2
相关论文
共 50 条
  • [41] Context-Aware Latent Dirichlet Allocation for Topic Segmentation
    Li, Wenbo
    Matsukawa, Tetsu
    Saigo, Hiroto
    Suzuki, Einoshin
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT I, 2020, 12084 : 475 - 486
  • [42] An Improved Latent Dirichlet Allocation Method for Service Topic Detection
    Guo Lantian
    Li Zhe
    Yang Tao
    Zhang Huixiang
    Mu Dejun
    Li Yang
    [J]. PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 7045 - 7049
  • [43] Topic modeling with latent Dirichlet allocation for cancer disease posts
    Altintas, Volkan
    Albayrak, Mehmet
    Topal, Kamil
    [J]. JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2021, 36 (04): : 2183 - 2196
  • [44] Constrained Latent Dirichlet Allocation for Subgroup Discovery with Topic Rules
    Li, Rui
    Ahmadi, Zahra
    Kramer, Stefan
    [J]. 21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 519 - +
  • [45] Topic Modelling Twitter Data with Latent Dirichlet Allocation Method
    Negara, Edi Surya
    Triadi, Dendi
    Andryani, Ria
    [J]. 2019 3RD INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND COMPUTER SCIENCE (ICECOS 2019), 2019, : 386 - 390
  • [46] A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings
    Hu, Weihua
    Tsujii, Jun'ichi
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2016), VOL 2, 2016, : 380 - 386
  • [47] LDAPrototype: a model selection algorithm to improve reliability of latent Dirichlet allocation
    Rieger, Jonas
    Jentsch, Carsten
    Rahnenfuehrer, Jorg
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [48] Automatic Topic Clustering Using Latent Dirichlet Allocation with Skip-gram Model on Final Project Abstracts
    Bunyamin, Hendra
    Sulistiani, Lisan
    [J]. 2017 21ST INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE (ICSEC 2017), 2017, : 264 - 267
  • [49] Topic detection and tracking for conversational content by using conceptual dynamic latent Dirichlet allocation
    Yeh, Jui-Feng
    Tan, Yi-Shan
    Lee, Chen-Hsien
    [J]. NEUROCOMPUTING, 2016, 216 : 310 - 318
  • [50] Mining Web Log Data for News Topic Modeling Using Latent Dirichlet Allocation
    Surjandari, Isti
    Rosyidah, Asma
    Zulkarnain
    Laoh, Enrico
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 331 - 335