A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions

被引：0

作者：

Yang, Zhao ^{[1
,3
]}

Ng, Dianwen ^{[2
,3
]}

Zhang, Chong

Jiang, Rui ^{[1
]}

Xi, Wei ^{[1
]}

Ma, Yukun ^{[2
]}

Ni, Chongjia ^{[2
]}

Zhao, Jizhong ^{[1
]}

Ma, Bin ^{[2
]}

Chng, Eng Siong ^{[3
]}

机构：

[1] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Xian, Peoples R China

[2] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China

[3] Nanyang Technol Univ, Singapore, Singapore

来源：

INTERSPEECH 2023 | 2023年

基金：

国家重点研发计划;

关键词：

Speech Recognition; Error Correction; Unified Model; Interactive Training; Noisy and Accented Speech;

D O I：

10.21437/Interspeech.2023-1300

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Automatic speech recognition (ASR) and its post-processing, such as recognition error correction, are usually cascaded in a pipeline ignoring their strong interconnection. Inspired by the recent progress of leveraging text data to improve linguistic modeling, we propose a Unified ASR and error Correction framework (UAC), coupling speech recognition and error correction to capture richer semantic information for improving the performance of speech recognition. The proposed framework established interaction between speech and textual representations via explicitly fusing their uni-modal embeddings in a shared encoder. Additionally, the proposed framework is flexible to operate in either synchronous or asynchronous variant and could be equipped with modality and task tags enhancing its adaptation to heterogeneous inputs. Experimental results on accented and noisy speech datasets demonstrate that our method effectively produces improved word error rate when compared against the pipeline baselines.

引用

页码：4953 / 4957

页数：5

共 50 条

[1] An improved noisy channel model for speech recognition error correction
Li, Baoxiang
Liu, Gang
Guo, Jun
Lu, Yueming
International Journal of Advancements in Computing Technology, 2012, 4 (12) : 110 - 118
[2] Improved robustness for speech recognition under noisy conditions using correlated parallel model combination
Hung, JW
Shen, JL
Lee, LS
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 553 - 556
[3] Speech Recognition Performance under Noisy Conditions of Children with Hearing Loss
Yang, Hui-Mei
Hsieh, Yi-Jung
Wu, Jiunn-Liang
CLINICAL AND EXPERIMENTAL OTORHINOLARYNGOLOGY, 2012, 5 : S73 - S75
[4] Improved parallel model combination techniques with split Gaussian mixtures for speech recognition under noisy conditions
Hung, JW
Shen, JL
Lee, LS
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 437 - 440
[5] SPEECH RECOGNITION IN UNSEEN AND NOISY CHANNEL CONDITIONS
Mitra, Vikramjit
Franco, Horacio
Bartels, Chris
van Hout, Julien
Graciarena, Martin
Vergyri, Dimitra
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5215 - 5219
[6] Robust Speech Recognition for Similar Japanese Pronunciation Phrases Under Noisy Conditions
Mufungulwa, George
Tsutsui, Hiroshi
Miyanaga, Yoshikazu
Abe, Shin-ichi
Ochi, Mitsuru
2017 INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS), 2017,
[7] Real-time Speech Recognition Engine for Accent Correction using Hidden Markov Model
Lazaro, J. B., Jr.
Po, M. C. P.
Rarriones, L. M.
Tolidanes, P. M. L.
4TH ELECTRONIC AND GREEN MATERIALS INTERNATIONAL CONFERENCE 2018 (EGM 2018), 2018, 2045
[8] Development of a Vietnamese Large Vocabulary Continuous Speech Recognition System under Noisy Conditions
Quoc Bao Nguyen
Van Tuan Mai
Quang Trung Le
Ba Quyen Dam
Van Hai Do
PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY (SOICT 2018), 2018, : 222 - 226
[9] Application of modified off-axis spectrum to speech recognition under noisy conditions
Nakagaki, Atsushi
Miyanaga, Yoshikazu
Tochinai, Koji
Electronics and Communications in Japan, Part III: Fundamental Electronic Science (English translation of Denshi Tsushin Gakkai Ronbunshi), 1992, 75 (03): : 102 - 110
[10] ROBUST SPEECH RECOGNITION IN UNKNOWN REVERBERANT AND NOISY CONDITIONS
Hsiao, Roger
Ma, Jeff
Hartmann, William
Karafiat, Martin
Grezl, Frantisek
Burget, Lukas
Szoke, Igor
Cernocky, Jan Honza
Watanabe, Shinji
Chen, Zhuo
Mallidi, Sri Harish
Hermansky, Hynek
Tsakalidis, Stavros
Schwartz, Richard
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 533 - 538

← 1 2 3 4 5 →