A Comparative Study on Selecting Acoustic Modeling Units for WFST-based Mongolian Speech Recognition

被引:0
|
作者
Wang Yonghe [1 ]
Bao, Feilong [1 ]
Gao, Gaunglai [1 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, 235 West Coll Rd, Hohhot 010021, Inner Mongolia, Peoples R China
关键词
Mongolian; speech recognition; acoustic modeling unit; alignment model; WFST;
D O I
10.1145/3617830
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional weighted finite-state transducer- (WFST) based Mongolian automatic speech recognition (ASR) systems use phonemes as pronunciation lexicon modeling units. However, Mongolian is an agglutinative, low-resource language, and building an ASR system based on the phoneme pronunciation lexicon remains a challenge for various reasons. First, the phoneme pronunciation lexicon manually constructed by Mongolian linguists is finite, which is usually used to build a grapheme-to-phoneme conversion (G2P) model to frequently expand new words. However, the data sparsity decreases the robustness of the G2P model and affects the performance of the final ASR system. Second, homophones and polysyllabic words are common in Mongolian, which has a certain impact on the construction of the Mongolian acoustic model. To address these problems, in this work, we first propose a grapheme-to-phoneme alignment model to obtain the mapping relationship between phonemes and subword units. Then, we construct an acoustic subword segmentation set to segment words directly instead of using the traditional G2P method to predict phoneme sequences to expand the pronunciation lexicon. Further, by analyzing the Mongolian encoding form, we also propose an acoustic subword modeling units construction method that removes control characters. Finally, we investigate various acoustic subword modeling units for pronunciation lexicon construction for the Mongolian ASR system. Experiments on a Mongolian dataset with 325 hours of training show that the pronunciation lexicon based on the acoustic subword modeling unit can effectively construct the WFST-based Mongolian ASR system. Further, removing the control characters when building the acoustic subword modeling unit can further improve the ASR system performance.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Gated Recurrent Units Based Hybrid Acoustic Models for Robust Speech Recognition
    Kang, Jian
    Zhang, Wei-Qiang
    Liu, Jia
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [32] Acoustic modelling for Chinese speech recognition: A comparative study of Mandarin and Cantonese
    Gao, S
    Lee, T
    Wong, YW
    Xu, B
    Ching, PC
    Huang, T
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1261 - 1264
  • [33] Acoustic modeling for speech recognition based on a generalized Laplacian mixture distribution
    Nakamura, A
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2002, 85 (11): : 32 - 42
  • [34] Multidialectal Spanish acoustic modeling for speech recognition
    Caballero, Monica
    Moreno, Asuncion
    Nogueiras, Albino
    SPEECH COMMUNICATION, 2009, 51 (03) : 217 - 229
  • [35] Acoustic Modeling in Speech Recognition: A Systematic Review
    Bhatt, Shobha
    Jain, Anurag
    Dev, Amita
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (04) : 397 - 412
  • [36] Joint acoustic and language modeling for speech recognition
    Chien, Jen-Tzung
    Chueh, Chuang-Hua
    SPEECH COMMUNICATION, 2010, 52 (03) : 223 - 235
  • [37] FEDERATED ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Cui, Xiaodong
    Lu, Songtao
    Kingsbury, Brian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6748 - 6752
  • [38] Mongolian Speech Recognition Based on Deep Neural Networks
    Zhang, Hui
    Bao, Feilong
    Gao, Guanglai
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 180 - 188
  • [39] Combined optimisation of baseforms and model parameters in speech recognition based on acoustic subword units
    Holter, T
    Svendsen, T
    1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 199 - 206
  • [40] Investigating The Use Of Syllable Acoustic Units For Amharic Speech Recognition
    Dribssa, Adey Edessa
    Tachbelie, Martha Yifiru
    PROCEEDINGS OF THE 2015 12TH IEEE AFRICON INTERNATIONAL CONFERENCE - GREEN INNOVATION FOR AFRICAN RENAISSANCE (AFRICON), 2015,