DUAL APPLICATION OF SPEECH ENHANCEMENT FOR AUTOMATIC SPEECH RECOGNITION

被引:19
|
作者
Pandey, Ashutosh [1 ,3 ]
Liu, Chunxi [2 ]
Wang, Yun [2 ]
Saraf, Yatharth [2 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Facebook AI, Menlo Pk, CA USA
[3] Facebook, Menlo Pk, CA USA
关键词
speech enhancement; speech recognition; recurrent neural network transducer; complex spectral mapping; consistency loss;
D O I
10.1109/SLT48900.2021.9383624
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we exploit speech enhancement for improving a recurrent neural network transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent network (DCRN) for complex spectral mapping based speech enhancement, and find it helpful for ASR in two ways: a data augmentation technique, and a preprocessing frontend. In using it for ASR data augmentation, we exploit a KL divergence based consistency loss that is computed between the ASR outputs of original and enhanced utterances. In using speech enhancement as an effective ASR frontend, we propose a three-step training scheme based on model pretraining and feature selection. We evaluate our proposed techniques on a challenging social media English video dataset, and achieve an average relative improvement of 11.2% with speech enhancement based data augmentation, 8.3% with enhancement based preprocessing, and 13.4% when combining both.
引用
收藏
页码:223 / 228
页数:6
相关论文
共 50 条
  • [1] NETWORKS FOR SPEECH ENHANCEMENT AND AUTOMATIC SPEECH RECOGNITION
    Vu, Thanh T.
    Bigot, Benjamin
    Chng, Eng Siong
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 499 - 503
  • [2] AN APPLICATION OF AUTOMATIC SPEECH RECOGNITION
    HENTHORN, KS
    MACCORMACK, PJ
    [J]. JOURNAL OF MICROCOMPUTER APPLICATIONS, 1982, 5 (03): : 239 - 245
  • [3] Arabic Automatic Speech Recognition Enhancement
    Ahmed, Basem H. A.
    Ghabayen, Ayman S.
    [J]. 2017 PALESTINIAN INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (PICICT), 2017, : 98 - 102
  • [4] Multi-Stage Speech Enhancement for Automatic Speech Recognition
    Lee, Seungyeol
    Lee, Youngwoo
    Cho, Namgook
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
  • [5] An Improved Switch Speech Enhancement Algorithm for Automatic Speech Recognition
    Ma, Yongbao
    Zhou, Yi
    Liu, Jingang
    Xia, Jie
    Liu, Hongqing
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2015, : 430 - 435
  • [6] CONSTRAINED ITERATIVE SPEECH ENHANCEMENT WITH APPLICATION TO SPEECH RECOGNITION
    HANSEN, JHL
    CLEMENTS, MA
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (04) : 795 - 805
  • [7] Coupling Particle Filters with Automatic Speech Recognition for Speech Feature Enhancement
    Faubel, Friedrich
    Woelfel, Matthias
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 37 - 40
  • [8] Comparative Evaluation of Speech Enhancement Methods for Robust Automatic Speech Recognition
    Paliwal, Kuldip K.
    Lyons, James G.
    So, Stephen
    Stark, Anthony P.
    Wojcicki, Kamil K.
    [J]. 2010 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2010,
  • [9] Speech Enhancement Parameter Adjustment to Maximize Accuracy of Automatic Speech Recognition
    Kawase, Tomoko
    Okamoto, Manabu
    Fukutomi, Takaaki
    Takahashi, Yamato
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2020, 66 (02) : 125 - 133
  • [10] Auditory driven subband speech enhancement for automatic recognition of noisy speech
    Upadhyay N.
    Rosales H.G.
    [J]. International Journal of Speech Technology, 2016, 19 (4) : 869 - 880