A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions

被引:0
|
作者
Yang, Zhao [1 ,3 ]
Ng, Dianwen [2 ,3 ]
Zhang, Chong
Jiang, Rui [1 ]
Xi, Wei [1 ]
Ma, Yukun [2 ]
Ni, Chongjia [2 ]
Zhao, Jizhong [1 ]
Ma, Bin [2 ]
Chng, Eng Siong [3 ]
机构
[1] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Xian, Peoples R China
[2] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
来源
基金
国家重点研发计划;
关键词
Speech Recognition; Error Correction; Unified Model; Interactive Training; Noisy and Accented Speech;
D O I
10.21437/Interspeech.2023-1300
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) and its post-processing, such as recognition error correction, are usually cascaded in a pipeline ignoring their strong interconnection. Inspired by the recent progress of leveraging text data to improve linguistic modeling, we propose a Unified ASR and error Correction framework (UAC), coupling speech recognition and error correction to capture richer semantic information for improving the performance of speech recognition. The proposed framework established interaction between speech and textual representations via explicitly fusing their uni-modal embeddings in a shared encoder. Additionally, the proposed framework is flexible to operate in either synchronous or asynchronous variant and could be equipped with modality and task tags enhancing its adaptation to heterogeneous inputs. Experimental results on accented and noisy speech datasets demonstrate that our method effectively produces improved word error rate when compared against the pipeline baselines.
引用
收藏
页码:4953 / 4957
页数:5
相关论文
共 50 条
  • [1] An improved noisy channel model for speech recognition error correction
    Li, Baoxiang
    Liu, Gang
    Guo, Jun
    Lu, Yueming
    International Journal of Advancements in Computing Technology, 2012, 4 (12) : 110 - 118
  • [2] Improved robustness for speech recognition under noisy conditions using correlated parallel model combination
    Hung, JW
    Shen, JL
    Lee, LS
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 553 - 556
  • [3] Speech Recognition Performance under Noisy Conditions of Children with Hearing Loss
    Yang, Hui-Mei
    Hsieh, Yi-Jung
    Wu, Jiunn-Liang
    CLINICAL AND EXPERIMENTAL OTORHINOLARYNGOLOGY, 2012, 5 : S73 - S75
  • [4] Improved parallel model combination techniques with split Gaussian mixtures for speech recognition under noisy conditions
    Hung, JW
    Shen, JL
    Lee, LS
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 437 - 440
  • [5] SPEECH RECOGNITION IN UNSEEN AND NOISY CHANNEL CONDITIONS
    Mitra, Vikramjit
    Franco, Horacio
    Bartels, Chris
    van Hout, Julien
    Graciarena, Martin
    Vergyri, Dimitra
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5215 - 5219
  • [6] Robust Speech Recognition for Similar Japanese Pronunciation Phrases Under Noisy Conditions
    Mufungulwa, George
    Tsutsui, Hiroshi
    Miyanaga, Yoshikazu
    Abe, Shin-ichi
    Ochi, Mitsuru
    2017 INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS), 2017,
  • [7] Real-time Speech Recognition Engine for Accent Correction using Hidden Markov Model
    Lazaro, J. B., Jr.
    Po, M. C. P.
    Rarriones, L. M.
    Tolidanes, P. M. L.
    4TH ELECTRONIC AND GREEN MATERIALS INTERNATIONAL CONFERENCE 2018 (EGM 2018), 2018, 2045
  • [8] Development of a Vietnamese Large Vocabulary Continuous Speech Recognition System under Noisy Conditions
    Quoc Bao Nguyen
    Van Tuan Mai
    Quang Trung Le
    Ba Quyen Dam
    Van Hai Do
    PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY (SOICT 2018), 2018, : 222 - 226
  • [9] Application of modified off-axis spectrum to speech recognition under noisy conditions
    Nakagaki, Atsushi
    Miyanaga, Yoshikazu
    Tochinai, Koji
    Electronics and Communications in Japan, Part III: Fundamental Electronic Science (English translation of Denshi Tsushin Gakkai Ronbunshi), 1992, 75 (03): : 102 - 110
  • [10] ROBUST SPEECH RECOGNITION IN UNKNOWN REVERBERANT AND NOISY CONDITIONS
    Hsiao, Roger
    Ma, Jeff
    Hartmann, William
    Karafiat, Martin
    Grezl, Frantisek
    Burget, Lukas
    Szoke, Igor
    Cernocky, Jan Honza
    Watanabe, Shinji
    Chen, Zhuo
    Mallidi, Sri Harish
    Hermansky, Hynek
    Tsakalidis, Stavros
    Schwartz, Richard
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 533 - 538