A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions

被引:0
|
作者
Yang, Zhao [1 ,3 ]
Ng, Dianwen [2 ,3 ]
Zhang, Chong
Jiang, Rui [1 ]
Xi, Wei [1 ]
Ma, Yukun [2 ]
Ni, Chongjia [2 ]
Zhao, Jizhong [1 ]
Ma, Bin [2 ]
Chng, Eng Siong [3 ]
机构
[1] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Xian, Peoples R China
[2] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
来源
基金
国家重点研发计划;
关键词
Speech Recognition; Error Correction; Unified Model; Interactive Training; Noisy and Accented Speech;
D O I
10.21437/Interspeech.2023-1300
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) and its post-processing, such as recognition error correction, are usually cascaded in a pipeline ignoring their strong interconnection. Inspired by the recent progress of leveraging text data to improve linguistic modeling, we propose a Unified ASR and error Correction framework (UAC), coupling speech recognition and error correction to capture richer semantic information for improving the performance of speech recognition. The proposed framework established interaction between speech and textual representations via explicitly fusing their uni-modal embeddings in a shared encoder. Additionally, the proposed framework is flexible to operate in either synchronous or asynchronous variant and could be equipped with modality and task tags enhancing its adaptation to heterogeneous inputs. Experimental results on accented and noisy speech datasets demonstrate that our method effectively produces improved word error rate when compared against the pipeline baselines.
引用
收藏
页码:4953 / 4957
页数:5
相关论文
共 50 条
  • [21] Multi-model approach for noisy speech recognition
    Guan, CT
    Leung, SH
    Lau, WH
    ELECTRONICS LETTERS, 1998, 34 (01) : 30 - 32
  • [22] Advancing Speech Recognition With No Speech Or With Noisy Speech
    Krishna, Gautam
    Tran, Co
    Carnahan, Mason
    Tewfik, Ahmed
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [23] Accuracy on Children's Speech Recognition under Noisy Circumstances
    Tian, Yu
    Tang, Jiayue
    Jiang, Xiaonan
    Tsutsui, Hiroshi
    Miyanaga, Yoshikazu
    2018 18TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2018, : 101 - 104
  • [24] Speech recognition based on unified model of acoustic and language aspects of speech
    1600, Nippon Telegraph and Telephone Corp. (11):
  • [25] ROBUST FRONT-END PROCESSING FOR SPEECH RECOGNITION IN NOISY CONDITIONS
    Das, Biswajit
    Panda, Ashish
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5235 - 5239
  • [26] Tri-modal Speech Recognition for Noisy and Variable Lighting Conditions
    Anderson, Steven
    Fong, Acm
    Tang, Jie
    2013 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2013, : 72 - +
  • [27] DISTANT SPEECH RECOGNITION IN REVERBERANT NOISY CONDITIONS EMPLOYING A MICROPHONE ARRAY
    Morales-Cordovilla, Juan A.
    Hagmueller, Martin
    Pessentheiner, Hannes
    Kubin, Gernot
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2380 - 2384
  • [28] A Study on Noisy Speech Recognition
    Saeed, Khalid
    Szczepanski, Adam
    ICBAKE: 2009 INTERNATIONAL CONFERENCE ON BIOMETRICS AND KANSEI ENGINEERING, 2009, : 142 - 147
  • [29] Enhanced Multichannel Histogram Equalization for Speech Recognition in noisy acoustic conditions
    Principi, Emanuele
    Rotili, Rudy
    Squartini, Stefano
    NEURAL NETS WIRN11, 2011, 234 : 149 - 161
  • [30] EVALUATION OF ADAPTIVE SPEECH CODERS UNDER NOISY CHANNEL CONDITIONS
    SCAGLIOLA, C
    BELL SYSTEM TECHNICAL JOURNAL, 1979, 58 (06): : 1369 - 1394