Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages

被引:11
|
作者
Cui, Xiaodong [1 ]
Xue, Jian [1 ]
Chen, Xin [2 ]
Olsen, Peder A. [1 ]
Dognin, Pierre L. [1 ]
Chaudhari, Upendra V. [1 ]
Hershey, John R. [3 ]
Zhou, Bowen [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] Pearson, Knowledge Technol Grp, Menlo Pk, CA 94025 USA
[3] Mitsubishi Elect Res Labs, Cambridge, MA 02139 USA
关键词
Bagging; bootstrap and restructuring; hidden Markov model (HMM); low-resourced language; large vocabulary continuous speech recognition (LVCSR); MAXIMUM-LIKELIHOOD; SPEECH;
D O I
10.1109/TASL.2012.2199982
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes an acoustic modeling approach based on bootstrap and restructuring to dealing with data sparsity for low-resourced languages. The goal of the approach is to improve the statistical reliability of acoustic modeling for automatic speech recognition (ASR) in the context of speed, memory and response latency requirements for real-world applications. In this approach, randomized hidden Markov models (HMMs) estimated from the bootstrapped training data are aggregated for reliable sequence prediction. The aggregation leads to an HMM with superior prediction capability at cost of a substantially larger size. For practical usage the aggregated HMM is restructured by Gaussian clustering followed by model refinement. The restructuring aims at reducing the aggregated HMM to a desirable model size while maintaining its performance close to the original aggregated HMM. To that end, various Gaussian clustering criteria and model refinement algorithms have been investigated in the full covariance model space before the conversion to the diagonal covariance model space in the last stage of the restructuring. Large vocabulary continuous speech recognition (LVCSR) experiments on Pashto and Dari have shown that acoustic models obtained by the proposed approach can yield superior performance over the conventional training procedure with almost the same run-time memory consumption and decoding speed.
引用
收藏
页码:2252 / 2264
页数:13
相关论文
共 50 条
  • [21] Multilingual broad phoneme recognition and language-independent spoken term detection for low-resourced languages
    Deekshitha, G.
    Mary, Leena
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (09) : 7313 - 7323
  • [22] Improving Domain-specific SMT for Low-resourced Languages using Data from Different Domains
    Farhath, Fathima
    Theivendiram, Pranavan
    Ranathunga, Surangika
    Jayasena, Sanath
    Dias, Gihan
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3789 - 3794
  • [23] Word Sense Disambiguation for Morphologically Rich Low-Resourced Languages: A Systematic Literature Review and Meta-Analysis
    Masethe, Hlaudi Daniel
    Masethe, Mosima Anna
    Ojo, Sunday Olusegun
    Giunchiglia, Fausto
    Owolawi, Pius Adewale
    [J]. INFORMATION, 2024, 15 (09)
  • [24] Enabling Spoken Dialogue Systems for Low-Resourced Languages-End-to-End Dialect Recognition for North Sami
    Trung Ngo Trong
    Jokinen, Kristiina
    Hautamaki, Ville
    [J]. 9TH INTERNATIONAL WORKSHOP ON SPOKEN DIALOGUE SYSTEM TECHNOLOGY, 2019, 579 : 221 - 235
  • [25] Deep Learning Transformer Architecture for Named-Entity Recognition on Low-Resourced Languages: State of the art results
    Hanslo, Ridewaan
    [J]. PROCEEDINGS OF THE 2022 17TH CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENCE SYSTEMS (FEDCSIS), 2022, : 53 - 60
  • [26] Acoustic Factor Analysis for Streamed Hidden Markov Modeling
    Chien, Jen-Tzung
    Ting, Chuan-Wei
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07): : 1279 - 1291
  • [27] Acoustic Modeling for Under-resourced Languages: A Role in Vietnamese Soccer Video Retrieval
    Pham, Nhut M.
    Vu, Quan H.
    [J]. 2013 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC), 2013, : 652 - 656
  • [28] The Best of both Worlds: Dual Channel Language modeling for Hope Speech Detection in low-resourced Kannada
    Hande, Adeep
    Hegde, Siddhanth U.
    Sangeetha, Sivanesan
    Priyadharshini, Ruba
    Chakravarthi, Bharathi Raja
    [J]. PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 127 - 135
  • [29] Factor analysis of acoustic features for streamed hidden Markov modeling
    Ting, Chuan-Wei
    Chien, Jen-Tzung
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 30 - 35
  • [30] Attention-Based Neural Machine Translation Approach for Low-Resourced Indic Languages-A Case of Sanskrit to Hindi Translation
    Bakarola, Vishvajit
    Nasriwala, Jitendra
    [J]. SMART SYSTEMS: INNOVATIONS IN COMPUTING (SSIC 2021), 2022, 235 : 565 - 572