A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions

被引：0

作者：

Yang, Zhao ^{[1
,3
]}

Ng, Dianwen ^{[2
,3
]}

Zhang, Chong

Jiang, Rui ^{[1
]}

Xi, Wei ^{[1
]}

Ma, Yukun ^{[2
]}

Ni, Chongjia ^{[2
]}

Zhao, Jizhong ^{[1
]}

Ma, Bin ^{[2
]}

Chng, Eng Siong ^{[3
]}

机构：

[1] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Xian, Peoples R China

[2] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China

[3] Nanyang Technol Univ, Singapore, Singapore

来源：

INTERSPEECH 2023 | 2023年

基金：

国家重点研发计划;

关键词：

Speech Recognition; Error Correction; Unified Model; Interactive Training; Noisy and Accented Speech;

D O I：

10.21437/Interspeech.2023-1300

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Automatic speech recognition (ASR) and its post-processing, such as recognition error correction, are usually cascaded in a pipeline ignoring their strong interconnection. Inspired by the recent progress of leveraging text data to improve linguistic modeling, we propose a Unified ASR and error Correction framework (UAC), coupling speech recognition and error correction to capture richer semantic information for improving the performance of speech recognition. The proposed framework established interaction between speech and textual representations via explicitly fusing their uni-modal embeddings in a shared encoder. Additionally, the proposed framework is flexible to operate in either synchronous or asynchronous variant and could be equipped with modality and task tags enhancing its adaptation to heterogeneous inputs. Experimental results on accented and noisy speech datasets demonstrate that our method effectively produces improved word error rate when compared against the pipeline baselines.

引用

页码：4953 / 4957

页数：5

共 50 条

[41] Contribution of modulation spectral features for cross-lingual speech emotion recognition under noisy reverberant conditions
Guo, Taiyang
Li, Sixia
Kidani, Shunsuke
Okada, Shogo
Unoki, Masashi
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2221 - 2227
[42] Multi-frame GMM-based block quantisation for distributed speech recognition under noisy conditions
So, Stephen
Paliwal, Kuldip K.
2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 189 - 192
[43] Accent classification from an emotional speech in clean and noisy environments
Priya Dharshini G
K Sreenivasa Rao
Multimedia Tools and Applications, 2023, 82 : 3485 - 3508
[44] Accent classification from an emotional speech in clean and noisy environments
Dharshini, Priya G.
Rao, K. Sreenivasa
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (03) : 3485 - 3508
[45] Large vocabulary mandarin continuous speech recognition under noisy environment
Zhao, Qingwei
Yan, Yonghong
Pan, Jielin
Fu, Qiang
Zhang, Jianping
Lv, Ping
Pan, Fuping
ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 2, PROCEEDINGS, 2007, : 660 - +
[46] ROBUST SPEECH RECOGNITION UNDER NOISY ENVIRONMENTS USING ASYMMETRIC TAPERS
Alam, Md Jahangir
Kenny, Patrick
O'Shaughnessy, Douglas
2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1638 - 1642
[47] Unified maximum likelihood approach to acoustic mismatch compensation: Application to noisy Lombard speech recognition
Afify, M
Gong, YF
Haton, JP
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 839 - 842
[48] Fast accent identification and accented speech recognition
Kat, LW
Fung, P
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 221 - 224
[49] Multi-Accent Chinese Speech Recognition
Liu Yi
Fung, Pascale
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 133 - +
[50] Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions
Nam, Youngja
Lee, Chankyu
SENSORS, 2021, 21 (13)

← 1 2 3 4 5 →