A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions

被引:0
|
作者
Yang, Zhao [1 ,3 ]
Ng, Dianwen [2 ,3 ]
Zhang, Chong
Jiang, Rui [1 ]
Xi, Wei [1 ]
Ma, Yukun [2 ]
Ni, Chongjia [2 ]
Zhao, Jizhong [1 ]
Ma, Bin [2 ]
Chng, Eng Siong [3 ]
机构
[1] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Xian, Peoples R China
[2] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
来源
基金
国家重点研发计划;
关键词
Speech Recognition; Error Correction; Unified Model; Interactive Training; Noisy and Accented Speech;
D O I
10.21437/Interspeech.2023-1300
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) and its post-processing, such as recognition error correction, are usually cascaded in a pipeline ignoring their strong interconnection. Inspired by the recent progress of leveraging text data to improve linguistic modeling, we propose a Unified ASR and error Correction framework (UAC), coupling speech recognition and error correction to capture richer semantic information for improving the performance of speech recognition. The proposed framework established interaction between speech and textual representations via explicitly fusing their uni-modal embeddings in a shared encoder. Additionally, the proposed framework is flexible to operate in either synchronous or asynchronous variant and could be equipped with modality and task tags enhancing its adaptation to heterogeneous inputs. Experimental results on accented and noisy speech datasets demonstrate that our method effectively produces improved word error rate when compared against the pipeline baselines.
引用
收藏
页码:4953 / 4957
页数:5
相关论文
共 50 条
  • [31] Robust Speech Recognition under Noisy Environment using Speech Rate Training System
    Dhas, Edwin D.
    Ruban, Bency L.
    King, Arul J.
    2012 THIRD INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION & NETWORKING TECHNOLOGIES (ICCCNT), 2012,
  • [32] The regularized SNN-TA model for recognition of noisy speech
    Trentin, E
    Matassoni, M
    IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL V, 2000, : 97 - 102
  • [33] A fast algorithm for parallel model combination for noisy speech recognition
    Hwang, TH
    Wang, HC
    COMPUTER SPEECH AND LANGUAGE, 2000, 14 (02): : 81 - 100
  • [34] HCRF-based Model Compensation for Noisy Speech Recognition
    Hong, Wei-Tyng
    2013 IEEE 17TH INTERNATIONAL SYMPOSIUM ON CONSUMER ELECTRONICS (ISCE), 2013, : 277 - 278
  • [35] An Improved Parallel Model Combination Method for Noisy Speech Recognition
    Veisi, Hadi
    Sameti, Hossein
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 237 - 242
  • [36] Linearized distortion model for robust speech recognition in noisy environments
    He, Yong-Jun
    Han, Ji-Qing
    Tongxin Xuebao/Journal on Communications, 2010, 31 (09): : 8 - 14
  • [37] Perceptual speech modeling for noisy speech recognition
    Wu, CH
    Chiu, YH
    Lim, H
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 385 - 388
  • [38] Noisy speech recognition based on speech enhancement
    Wang, Xia
    Tang, Hongmei
    Zhao, Xiaoqun
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 713 - +
  • [39] Model-based feature enhancement for noisy speech recognition
    Couvreur, C
    Van hamme, H
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1719 - 1722
  • [40] Task-specific speech enhancement and data augmentation for improved multimodal emotion recognition under noisy conditions
    Kshirsagar, Shruti
    Pendyala, Anurag
    Falk, Tiago H. H.
    FRONTIERS IN COMPUTER SCIENCE, 2023, 5