A Parallel Training Algorithm for Hierarchical Pitman-Yor Process Language Models

被引:0
|
作者
Huang, Songfang [1 ]
Renals, Steve [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland
关键词
language model; Pitman-Yor processes; hierarchical Bayesian models; parallel training; meetings;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Hierarchical Pitman Yor Process Language Model (HPYLM) is a Bayesian language model based on a non-parametric prior, the Pitman-Yor Process. It has been demonstrated, both theoretically and practically, that the HPYLM can provide better smoothing for language modeling, compared with state-of-the-art approaches such as interpolated Kneser-Ney and modified Kneser-Ney smoothing. However, estimation of Bayesian language models is expensive in terms of both computation time and memory; the inference is approximate and requires a number of iterations to converge. In this paper, we present a parallel training algorithm for the HPYLM, which enables the approach to be applied in the context of automatic speech recognition, using large training corpora with large vocabularies. We demonstrate the effectiveness of the proposed algorithm by estimating language models from corpora for meeting transcription containing over 200 million words, and observe significant reductions in perplexity and word error rate.
引用
收藏
页码:2663 / 2666
页数:4
相关论文
共 50 条
  • [1] Hierarchical Pitman-Yor language models for ASR in meetings
    Huang, Songfang
    Renals, Steve
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 124 - 129
  • [2] Hierarchical Pitman-Yor and Dirichlet Process for Language Model
    Chien, Jen-Tzung
    Chang, Ying-Lan
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2211 - 2215
  • [3] Hierarchical Pitman-Yor Language Model for Information Retrieval
    Momtazi, Saeedeh
    Klakow, Dietrich
    [J]. SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 793 - 794
  • [4] Perfect Sampling of the Posterior in the Hierarchical Pitman-Yor Process
    Bacallado, Sergio
    Favaro, Stefano
    Power, Samuel
    Trippa, Lorenzo
    [J]. BAYESIAN ANALYSIS, 2022, 17 (03): : 685 - 709
  • [5] Supervised hierarchical Pitman-Yor process for natural scene segmentation
    Shyr, Alex
    Darrell, Trevor
    Jordan, Michael
    Urtasun, Raquel
    [J]. 2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011,
  • [6] A Hierarchical Bayesian Language Model based on Pitman-Yor Processes
    Teh, Yee Whye
    [J]. COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 985 - 992
  • [7] Stochastic Approximations to the Pitman-Yor Process
    Arbel, Julyan
    De Blasi, Pierpaolo
    Prunster, Igor
    [J]. BAYESIAN ANALYSIS, 2019, 14 (04): : 1201 - 1219
  • [8] Genre-Based Music Language Modeling with Latent Hierarchical Pitman-Yor Process Allocation
    Raczynski, Stanislaw A.
    Vincent, Emmanuel
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (03) : 672 - 681
  • [9] Genre-based music language modeling with latent hierarchical Pitman-Yor process allocation
    [J]. 1600, Institute of Electrical and Electronics Engineers Inc., United States (22):
  • [10] LIMIT THEOREMS ASSOCIATED WITH THE PITMAN-YOR PROCESS
    Feng, Shui
    Gao, Fuqing
    Zhou, Youzhou
    [J]. ADVANCES IN APPLIED PROBABILITY, 2017, 49 (02) : 581 - 602