Improved Deliberation Network with Text Pre-training for Code-Switching Automatic Speech Recognition

被引:0
|
作者
Shen, Zhijie [1 ]
Guo, Wu [1 ]
机构
[1] Univ Sci & Technol China, Dept Elect Engn & Informat Sci EEIS, Hefei, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
automatic speech recognition; code-switching; deliberation network; text pre-training;
D O I
10.21437/Interspeech.2022-221
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes an improved deliberation network (DN) for end-to-end code-switching (CS) automatic speech recognition (ASR). In a conventional DN, acoustic encoding and first-pass hypothesis encoding are utilized separately and are simply combined by summation, which cannot take full advantage of their potential complementarity. Hence, the proposed improved DN model exploits the relationship between the two encodings through a two-staged process. First, by integrating the two encodings into a unified semantic space through a shared encoder, and second, by capturing the relevant information from the acoustic encoding through an attention mechanism before the final decoding process. Moreover, the lack of paired training data restricts the generalization ability of the model in CS ASR. To address this problem, the developed DN is pre-trained based on a denoising sequence-to-sequence (seq2seq) objective using unpaired text data. Experiments on a Chinese-English CS dataset demonstrate the effectiveness of the proposed method. Compared with the conventional DN, a 13.5% relative error rate reduction is observed.
引用
收藏
页码:3854 / 3858
页数:5
相关论文
共 50 条
  • [1] CSP: Code-Switching Pre-training for Neural Machine Translation
    Yang, Zhen
    Hu, Bojie
    Han, Ambyera
    Huang, Shen
    Ju, Qi
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2624 - 2636
  • [2] Code-Switching in Automatic Speech Recognition: The Issues and Future Directions
    Mustafa, Mumtaz Begum
    Yusoof, Mansoor Ali
    Khalaf, Hasan Kahtan
    Abushariah, Ahmad Abdel Rahman Mahmoud
    Kiah, Miss Laiha Mat
    Hua Nong Ting
    Muthaiyah, Saravanan
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [3] BENCHMARKING EVALUATION METRICS FOR CODE-SWITCHING AUTOMATIC SPEECH RECOGNITION
    Hamed, Injy
    Hussein, Amir
    Chellah, Oumnia
    Chowdhury, Shammur
    Mubarak, Hamdy
    Sitaram, Sunayana
    Habash, Nizar
    Ali, Ahmed
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 999 - 1005
  • [4] Unified Speech-Text Pre-training for Speech Translation and Recognition
    Tang, Yun
    Gong, Hongyu
    Dong, Ning
    Wang, Changhan
    Hsu, Wei-Ning
    Gu, Jiatao
    Baevski, Alexei
    Li, Xian
    Mohamed, Abdelrahman
    Auli, Michael
    Pino, Juan
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1488 - 1499
  • [5] AN EVALUATION BENCHMARK FOR AUTOMATIC SPEECH RECOGNITION OF GERMAN-ENGLISH CODE-SWITCHING
    Khosravani, Abbas
    Garner, Philip N.
    Lazaridis, Alexandros
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 811 - 816
  • [6] Learning Adapters for Code-Switching Speech Recognition
    He, Chun-Yi
    Chien, Jen-Tzung
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 344 - 349
  • [7] Recognition and Translation of Code-switching Speech Utterances
    Nakayama, Sahoko
    Kano, Takatomo
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 34 - 39
  • [8] SENTIMENT-AWARE AUTOMATIC SPEECH RECOGNITION PRE-TRAINING FOR ENHANCED SPEECH EMOTION RECOGNITION
    Ghriss, Ayoub
    Yang, Bo
    Rozgic, Viktor
    Shriberg, Elizabeth
    Wang, Chao
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7347 - 7351
  • [9] IITG-HingCoS corpus: A Hinglish code-switching database for automatic speech recognition
    Ganji, Sreeram
    Dhawan, Kunal
    Sinha, Rohit
    [J]. SPEECH COMMUNICATION, 2019, 110 : 76 - 89
  • [10] DECOUPLING PRONUNCIATION AND LANGUAGE FOR END-TO-END CODE-SWITCHING AUTOMATIC SPEECH RECOGNITION
    Zhang, Shuai
    Yi, Jiangyan
    Tian, Zhengkun
    Bai, Ye
    Tao, Jianhua
    Wen, Zhengqi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6249 - 6253