Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages

被引:11
|
作者
Cui, Xiaodong [1 ]
Xue, Jian [1 ]
Chen, Xin [2 ]
Olsen, Peder A. [1 ]
Dognin, Pierre L. [1 ]
Chaudhari, Upendra V. [1 ]
Hershey, John R. [3 ]
Zhou, Bowen [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] Pearson, Knowledge Technol Grp, Menlo Pk, CA 94025 USA
[3] Mitsubishi Elect Res Labs, Cambridge, MA 02139 USA
关键词
Bagging; bootstrap and restructuring; hidden Markov model (HMM); low-resourced language; large vocabulary continuous speech recognition (LVCSR); MAXIMUM-LIKELIHOOD; SPEECH;
D O I
10.1109/TASL.2012.2199982
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes an acoustic modeling approach based on bootstrap and restructuring to dealing with data sparsity for low-resourced languages. The goal of the approach is to improve the statistical reliability of acoustic modeling for automatic speech recognition (ASR) in the context of speed, memory and response latency requirements for real-world applications. In this approach, randomized hidden Markov models (HMMs) estimated from the bootstrapped training data are aggregated for reliable sequence prediction. The aggregation leads to an HMM with superior prediction capability at cost of a substantially larger size. For practical usage the aggregated HMM is restructured by Gaussian clustering followed by model refinement. The restructuring aims at reducing the aggregated HMM to a desirable model size while maintaining its performance close to the original aggregated HMM. To that end, various Gaussian clustering criteria and model refinement algorithms have been investigated in the full covariance model space before the conversion to the diagonal covariance model space in the last stage of the restructuring. Large vocabulary continuous speech recognition (LVCSR) experiments on Pashto and Dari have shown that acoustic models obtained by the proposed approach can yield superior performance over the conventional training procedure with almost the same run-time memory consumption and decoding speed.
引用
收藏
页码:2252 / 2264
页数:13
相关论文
共 50 条
  • [1] Acoustic Modeling with Bootstrap and Restructuring for Low-resourced Languages
    Cui, Xiaodong
    Xue, Jian
    Dognin, Pierre L.
    Chaudhari, Upendra V.
    Zhou, Bowen
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2974 - 2977
  • [2] Multilingual Neural Semantic Parsing for Low-Resourced Languages
    Xia, Menglin
    Monti, Emilio
    [J]. 10TH CONFERENCE ON LEXICAL AND COMPUTATIONAL SEMANTICS (SEM 2021), 2021, : 185 - 194
  • [3] Neural Machine Translation for Low-Resourced Indian Languages
    Choudhary, Himanshu
    Rao, Shivansh
    Rohilla, Rajesh
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3610 - 3615
  • [4] Surface Realization Architecture for Low-resourced African Languages
    Mahlaza, Zola
    Keet, C. Maria
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)
  • [5] A Linguistics-Driven Approach to Statistical Parsing for Low-Resourced Languages
    Boonkwan, Prachya
    Supnithi, Thepchai
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (05): : 1045 - 1052
  • [6] Towards Mental Health Analysis in Social Media for Low-resourced Languages
    Garg, Muskan
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (03)
  • [7] Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages
    Gutkin, Alexander
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2183 - 2187
  • [8] ASR DOMAIN ADAPTATION METHODS FOR LOW-RESOURCED LANGUAGES: APPLICATION TO ROMANIAN LANGUAGE
    Cucu, Horia
    Besacier, Laurent
    Burileanu, Corneliu
    Buzo, Andi
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1648 - 1652
  • [9] Low-resourced Languages and Online Knowledge Repositories: A Need-Finding Study
    Nigatu, Hellina Hailu
    Canny, John
    Chasins, Sarah E.
    [J]. PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,
  • [10] ASR FOR LOW-RESOURCED LANGUAGES: BUILDING A PHONETICALLY BALANCED ROMANIAN SPEECH CORPUS
    Stanescu , Miruna
    Cucu, Horia
    Buzo, Andi
    Burileanu, Corneliu
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2060 - 2064