CYCLEGAN BANDWIDTH EXTENSION ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Haws, David [1 ]
Cui, Xiaodong [1 ]
机构
[1] IBM TJ Watson Res Ctr, IBM Res AI, Yorktown Hts, NY 10598 USA
关键词
speech recognition; deep neural networks; bandwidth extension; cycle consistent generative adversarial networks; acoustic modeling;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Although narrowband (NB) and wideband (WB) speech data primarily differ in sampling rate, these two common input sources are difficult to simultaneously model for automatic speech recognition (ASR). Meanwhile, cycle consistent generative adversarial networks (CycleGANs) have been shown value in a number of acoustic tasks such as mapping between domains due to their powerful generators. We apply Cycle-GAN to the task of bandwidth extension (BWE) and test a variety of architectures. The CycleGANs produce encouraging losses and reconstructed spectrograms. In order to further reduce word error rates (WER) we add an additional discriminative loss to the CycleGAN BWE architecture. This more closely matches our ASR goal and we show gains in WER compared to a standard BWE model discriminatively trained only to map from upsampled narrowband (UNB) to WB data.
引用
收藏
页码:6780 / 6784
页数:5
相关论文
共 50 条
  • [41] Acoustic feature selection for automatic emotion recognition from speech
    Rong, Jia
    Li, Gang
    Chen, Yi-Ping Phoebe
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2009, 45 (03) : 315 - 328
  • [42] Acoustic Analysis and Automatic Recognition of Spontaneous Children's Speech
    Gerosa, M.
    Giuliani, D.
    Narayanan, S.
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1886 - +
  • [43] Robust automatic speech recognition with missing and unreliable acoustic data
    Cooke, M
    Green, P
    Josifovski, L
    Vizinho, A
    [J]. SPEECH COMMUNICATION, 2001, 34 (03) : 267 - 285
  • [44] Enhanced Automatic Speech Recognition with Non-acoustic Parameters
    Sreekanth, N. S.
    Narayanan, N. K.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL, NETWORKS, COMPUTING, AND SYSTEMS (ICSNCS 2016), VOL 1, 2017, 395 : 93 - 104
  • [45] A hybrid HMM/BN acoustic model for automatic speech recognition
    Markov, K
    Nakamura, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03): : 438 - 445
  • [46] MARKOV MODEL ACOUSTIC PHONETIC COMPONENT FOR AUTOMATIC SPEECH RECOGNITION
    TAPPERT, CC
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1977, 9 (03): : 363 - 373
  • [47] Multiexpert automatic speech recognition using acoustic and myoelectric signals
    Chan, ADC
    Englehart, KB
    Hudgins, B
    Lovely, DF
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2006, 53 (04) : 676 - 685
  • [48] Lexical modeling of non-native speech for automatic speech recognition
    Livescu, K
    Glass, J
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1683 - 1686
  • [49] A Decade of Discriminative Language Modeling for Automatic Speech Recognition
    Saraclar, Murat
    Dikici, Erinc
    Arisoy, Ebru
    [J]. SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 11 - 22
  • [50] An Evaluation of Structured Language Modeling for Automatic Speech Recognition
    Bjorklund, Johanna
    Cleophas, Loek
    Karlsson, My
    [J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2017, 23 (11) : 1019 - 1034