Pronunciation augmentation for Mandarin-English code-switching speech recognition

被引:5
|
作者
Long, Yanhua [1 ]
Wei, Shuang [1 ]
Lian, Jie [1 ]
Li, Yijie [2 ]
机构
[1] Shanghai Normal Univ, SHNU Unisound Joint Lab Nat Human Comp Interact, Shanghai Engn Res Ctr Intelligent Educ & Bigdata, Shanghai, Peoples R China
[2] Unisound AI Technol Co Ltd, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Code-switching; Phone mapping; Pronunciation variation; Lexicon learning; Speech recognition; NEURAL-NETWORKS; MODELS;
D O I
10.1186/s13636-021-00222-7
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Code-switching (CS) refers to the phenomenon of using more than one language in an utterance, and it presents great challenge to automatic speech recognition (ASR) due to the code-switching property in one utterance, the pronunciation variation phenomenon of the embedding language words and the heavy training data sparse problem. This paper focuses on the Mandarin-English CS ASR task. We aim at dealing with the pronunciation variation and alleviating the sparse problem of code-switches by using pronunciation augmentation methods. An English-to-Mandarin mix-language phone mapping approach is first proposed to obtain a language-universal CS lexicon. Based on this lexicon, an acoustic data-driven lexicon learning framework is further proposed to learn new pronunciations to cover the accents, mis-pronunciations, or pronunciation variations of those embedding English words. Experiments are performed on real CS ASR tasks. Effectiveness of the proposed methods are examined on all of the conventional, hybrid, and the recent end-to-end speech recognition systems. Experimental results show that both the learned phone mapping and augmented pronunciations can significantly improve the performance of code-switching speech recognition.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Pronunciation augmentation for Mandarin-English code-switching speech recognition
    Yanhua Long
    Shuang Wei
    Jie Lian
    Yijie Li
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [2] Mandarin-English Code-switching Speech Recognition
    Xu, Haihua
    Van Tung Pham
    Kyaw, Zin Tun
    Lim, Zhi Hao
    Chng, Eng Siong
    Li, Haizhou
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 554 - 555
  • [3] Acoustic data augmentation for Mandarin-English code-switching speech recognition
    Long, Yanhua
    Li, Yijie
    Zhang, Qiaozheng
    Wei, Shuang
    Ye, Hong
    Yang, Jichen
    [J]. APPLIED ACOUSTICS, 2020, 161
  • [4] NON-AUTOREGRESSIVE MANDARIN-ENGLISH CODE-SWITCHING SPEECH RECOGNITION
    Chuang, Shun-Po
    Chang, Heng-Jui
    Huang, Sung-Feng
    Lee, Hung-yi
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 465 - 472
  • [5] Cyclic Transfer Learning for Mandarin-English Code-Switching Speech Recognition
    Nga, Cao Hong
    Vu, Duc-Quang
    Luong, Huong Hoang
    Huang, Chien-Lin
    Wang, Jia-Ching
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1387 - 1391
  • [6] ADDRESSING ACCENT MISMATCH IN MANDARIN-ENGLISH CODE-SWITCHING SPEECH RECOGNITION
    Tan, Zhili
    Fan, Xinghua
    Zhu, Hui
    Lin, Ed
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8259 - 8263
  • [7] On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition
    Zeng, Zhiping
    Khassanov, Yerbolat
    Van Tung Pham
    Xu, Haihua
    Chng, Eng Siong
    Li, Haizhou
    [J]. INTERSPEECH 2019, 2019, : 2165 - 2169
  • [8] INVESTIGATING END-TO-END SPEECH RECOGNITION FOR MANDARIN-ENGLISH CODE-SWITCHING
    Shan, Changhao
    Weng, Chao
    Wang, Guangsen
    Su, Dan
    Luo, Min
    Yu, Dong
    Xie, Lei
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6056 - 6060
  • [9] A Mandarin-English Code-Switching Corpus
    Li, Ying
    Yu, Yue
    Fung, Pascale
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2515 - 2519
  • [10] Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching
    Li, Chia-Yu
    Ngoc Thang Vu
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 160 - 165