A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions

被引:0
|
作者
Yang, Zhao [1 ,3 ]
Ng, Dianwen [2 ,3 ]
Zhang, Chong
Jiang, Rui [1 ]
Xi, Wei [1 ]
Ma, Yukun [2 ]
Ni, Chongjia [2 ]
Zhao, Jizhong [1 ]
Ma, Bin [2 ]
Chng, Eng Siong [3 ]
机构
[1] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Xian, Peoples R China
[2] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
来源
基金
国家重点研发计划;
关键词
Speech Recognition; Error Correction; Unified Model; Interactive Training; Noisy and Accented Speech;
D O I
10.21437/Interspeech.2023-1300
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) and its post-processing, such as recognition error correction, are usually cascaded in a pipeline ignoring their strong interconnection. Inspired by the recent progress of leveraging text data to improve linguistic modeling, we propose a Unified ASR and error Correction framework (UAC), coupling speech recognition and error correction to capture richer semantic information for improving the performance of speech recognition. The proposed framework established interaction between speech and textual representations via explicitly fusing their uni-modal embeddings in a shared encoder. Additionally, the proposed framework is flexible to operate in either synchronous or asynchronous variant and could be equipped with modality and task tags enhancing its adaptation to heterogeneous inputs. Experimental results on accented and noisy speech datasets demonstrate that our method effectively produces improved word error rate when compared against the pipeline baselines.
引用
收藏
页码:4953 / 4957
页数:5
相关论文
共 50 条
  • [41] Contribution of modulation spectral features for cross-lingual speech emotion recognition under noisy reverberant conditions
    Guo, Taiyang
    Li, Sixia
    Kidani, Shunsuke
    Okada, Shogo
    Unoki, Masashi
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2221 - 2227
  • [42] Multi-frame GMM-based block quantisation for distributed speech recognition under noisy conditions
    So, Stephen
    Paliwal, Kuldip K.
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 189 - 192
  • [43] Accent classification from an emotional speech in clean and noisy environments
    Priya Dharshini G
    K Sreenivasa Rao
    Multimedia Tools and Applications, 2023, 82 : 3485 - 3508
  • [44] Accent classification from an emotional speech in clean and noisy environments
    Dharshini, Priya G.
    Rao, K. Sreenivasa
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (03) : 3485 - 3508
  • [45] Large vocabulary mandarin continuous speech recognition under noisy environment
    Zhao, Qingwei
    Yan, Yonghong
    Pan, Jielin
    Fu, Qiang
    Zhang, Jianping
    Lv, Ping
    Pan, Fuping
    ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 2, PROCEEDINGS, 2007, : 660 - +
  • [46] ROBUST SPEECH RECOGNITION UNDER NOISY ENVIRONMENTS USING ASYMMETRIC TAPERS
    Alam, Md Jahangir
    Kenny, Patrick
    O'Shaughnessy, Douglas
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1638 - 1642
  • [47] Unified maximum likelihood approach to acoustic mismatch compensation: Application to noisy Lombard speech recognition
    Afify, M
    Gong, YF
    Haton, JP
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 839 - 842
  • [48] Fast accent identification and accented speech recognition
    Kat, LW
    Fung, P
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 221 - 224
  • [49] Multi-Accent Chinese Speech Recognition
    Liu Yi
    Fung, Pascale
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 133 - +
  • [50] Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions
    Nam, Youngja
    Lee, Chankyu
    SENSORS, 2021, 21 (13)