CYCLEGAN BANDWIDTH EXTENSION ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Haws, David [1 ]
Cui, Xiaodong [1 ]
机构
[1] IBM TJ Watson Res Ctr, IBM Res AI, Yorktown Hts, NY 10598 USA
关键词
speech recognition; deep neural networks; bandwidth extension; cycle consistent generative adversarial networks; acoustic modeling;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Although narrowband (NB) and wideband (WB) speech data primarily differ in sampling rate, these two common input sources are difficult to simultaneously model for automatic speech recognition (ASR). Meanwhile, cycle consistent generative adversarial networks (CycleGANs) have been shown value in a number of acoustic tasks such as mapping between domains due to their powerful generators. We apply Cycle-GAN to the task of bandwidth extension (BWE) and test a variety of architectures. The CycleGANs produce encouraging losses and reconstructed spectrograms. In order to further reduce word error rates (WER) we add an additional discriminative loss to the CycleGAN BWE architecture. This more closely matches our ASR goal and we show gains in WER compared to a standard BWE model discriminatively trained only to map from upsampled narrowband (UNB) to WB data.
引用
收藏
页码:6780 / 6784
页数:5
相关论文
共 50 条
  • [1] FEDERATED ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Cui, Xiaodong
    Lu, Songtao
    Kingsbury, Brian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6748 - 6752
  • [2] Improved Acoustic Modeling for Automatic Dysarthric Speech Recognition
    Sriranjani, R.
    Reddy, M. Ramasubba
    Umesh, S.
    [J]. 2015 TWENTY FIRST NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2015,
  • [3] Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition
    Mac, Khoi-Nguyen C.
    Cui, Xiaodong
    Zhang, Wei
    Picheny, Michael
    [J]. INTERSPEECH 2019, 2019, : 251 - 255
  • [4] Automatic Speech Recognition for Uyghur through Multilingual Acoustic Modeling
    Abulimiti, Ayimunishagu
    Schultz, Tanja
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6444 - 6449
  • [5] Speech bandwidth extension method using speech recognition and speech synthesis
    Takashina, Masashi
    Kuroiwa, Shingo
    Tsuge, Satoru
    Ren, Fuji
    [J]. 2006 10TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2006, : 1273 - +
  • [6] Deep Learning in Acoustic Modeling for Automatic Speech Recognition and Understanding - An Overview -
    Gavat, Inge
    Militaru, Diana
    [J]. 2015 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2015,
  • [7] Acoustic Analysis for Automatic Speech Recognition
    O'Shaughnessy, Douglas
    [J]. PROCEEDINGS OF THE IEEE, 2013, 101 (05) : 1038 - 1053
  • [8] Rule-Based Triphone Mapping for Acoustic Modeling in Automatic Speech Recognition
    Darjaa, Sakhia
    Cernak, Milos
    Benus, Stefan
    Rusko, Milan
    Sabo, Robert
    Trnka, Marian
    [J]. TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 268 - 275
  • [9] Graph-Based Semisupervised Learning for Acoustic Modeling in Automatic Speech Recognition
    Liu, Yuzong
    Kirchhoff, Katrin
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 1946 - 1956
  • [10] Artificial Bandwidth Extension to Improve Automatic Emotion Recognition from Narrow-Band Coded Speech
    Albahri, Abas
    Rodriguez, Catherine S.
    Lech, Margaret
    [J]. 2016 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2016,