CYCLEGAN BANDWIDTH EXTENSION ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION

被引：0

作者：

Haws, David ^{[1
]}

Cui, Xiaodong ^{[1
]}

机构：

[1] IBM TJ Watson Res Ctr, IBM Res AI, Yorktown Hts, NY 10598 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

speech recognition; deep neural networks; bandwidth extension; cycle consistent generative adversarial networks; acoustic modeling;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Although narrowband (NB) and wideband (WB) speech data primarily differ in sampling rate, these two common input sources are difficult to simultaneously model for automatic speech recognition (ASR). Meanwhile, cycle consistent generative adversarial networks (CycleGANs) have been shown value in a number of acoustic tasks such as mapping between domains due to their powerful generators. We apply Cycle-GAN to the task of bandwidth extension (BWE) and test a variety of architectures. The CycleGANs produce encouraging losses and reconstructed spectrograms. In order to further reduce word error rates (WER) we add an additional discriminative loss to the CycleGAN BWE architecture. This more closely matches our ASR goal and we show gains in WER compared to a standard BWE model discriminatively trained only to map from upsampled narrowband (UNB) to WB data.

引用

页码：6780 / 6784

页数：5

共 50 条

[41] Acoustic feature selection for automatic emotion recognition from speech
Rong, Jia
Li, Gang
Chen, Yi-Ping Phoebe
[J]. INFORMATION PROCESSING & MANAGEMENT, 2009, 45 (03) : 315 - 328
[42] Acoustic Analysis and Automatic Recognition of Spontaneous Children's Speech
Gerosa, M.
Giuliani, D.
Narayanan, S.
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1886 - +
[43] Robust automatic speech recognition with missing and unreliable acoustic data
Cooke, M
Green, P
Josifovski, L
Vizinho, A
[J]. SPEECH COMMUNICATION, 2001, 34 (03) : 267 - 285
[44] Enhanced Automatic Speech Recognition with Non-acoustic Parameters
Sreekanth, N. S.
Narayanan, N. K.
[J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL, NETWORKS, COMPUTING, AND SYSTEMS (ICSNCS 2016), VOL 1, 2017, 395 : 93 - 104
[45] A hybrid HMM/BN acoustic model for automatic speech recognition
Markov, K
Nakamura, S
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03): : 438 - 445
[46] MARKOV MODEL ACOUSTIC PHONETIC COMPONENT FOR AUTOMATIC SPEECH RECOGNITION
TAPPERT, CC
[J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1977, 9 (03): : 363 - 373
[47] Multiexpert automatic speech recognition using acoustic and myoelectric signals
Chan, ADC
Englehart, KB
Hudgins, B
Lovely, DF
[J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2006, 53 (04) : 676 - 685
[48] Lexical modeling of non-native speech for automatic speech recognition
Livescu, K
Glass, J
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1683 - 1686
[49] A Decade of Discriminative Language Modeling for Automatic Speech Recognition
Saraclar, Murat
Dikici, Erinc
Arisoy, Ebru
[J]. SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 11 - 22
[50] An Evaluation of Structured Language Modeling for Automatic Speech Recognition
Bjorklund, Johanna
Cleophas, Loek
Karlsson, My
[J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2017, 23 (11) : 1019 - 1034

← 1 2 3 4 5 →