Using Deep Speech Recognition to Evaluate Speech Enhancement Methods

被引:2
|
作者
Siddiqui, Shamoon [1 ]
Rasool, Ghulam [1 ]
Ramachandran, Ravi P. [1 ]
Bouaynaya, Nidhal C. [1 ]
机构
[1] Rowan Univ, Dept Elect & Comp Engn, Glassboro, NJ 08028 USA
关键词
speech enhancement; distribution shift; signal-to-noise; benchmark; NOISE;
D O I
10.1109/ijcnn48605.2020.9206817
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Progress in speech-related tasks is dependent on the quality of the speech signal being processed. While much progress has been made in various aspects of speech processing (including but not limited to, speech recognition, language detection, and speaker diarization), enhancing a noise-corrupted speech signal as it relates to those tasks has not been rigorously evaluated. Speech enhancement aims to improve the signal-to-noise ratio of a noise-corrupted signal to boost the speech elements (signal) and reduce the non-speech ones (noise). Speech enhancement techniques are evaluated using metrics that are either subjective (asking people their opinion of the enhanced signal) or objective (attempt to calculate metrics based on the signal itself). The subjective measures are better indicators of improved quality but do not scale well to large datasets. The objective metrics have mostly been constructed to attempt to model the subjective results. Our goal in this work is to establish a benchmark to assess the improvement of speech enhancement as it relates to the downstream task of automated speech recognition. In doing so, we retain the qualities of subjective measures while ensuring that evaluation can be done at a large scale in an automated fashion. We explore the impact of various noise types, including stationary, non-stationary, and a shift in noise distribution. We found that existing objective metrics are not a strong indicator of performance as it relates to an improvement in a downstream task. As such, we believe that Word Error Rate should be used when the downstream task is automated speech recognition.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Compensation of speech enhancement distortion for robust speech recognition
    Ding, P
    Cao, ZG
    2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 449 - 452
  • [42] Speech enhancement applied to speech recognition in noisy environments
    Xu, Y.F., 2001, Press of Tsinghua University (41):
  • [43] DUAL APPLICATION OF SPEECH ENHANCEMENT FOR AUTOMATIC SPEECH RECOGNITION
    Pandey, Ashutosh
    Liu, Chunxi
    Wang, Yun
    Saraf, Yatharth
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 223 - 228
  • [44] CONSTRAINED ITERATIVE SPEECH ENHANCEMENT WITH APPLICATION TO SPEECH RECOGNITION
    HANSEN, JHL
    CLEMENTS, MA
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (04) : 795 - 805
  • [45] Improved modulation spectrum enhancement methods for robust speech recognition
    Hung, Jeih-weih
    Tu, Wen-hsiang
    Lai, Chien-chou
    SIGNAL PROCESSING, 2012, 92 (11) : 2791 - 2814
  • [46] CONTINUOUS SPEECH RECOGNITION USING DIFFERENT METHODS
    VICSI, K
    MATTILA, M
    BERENYI, P
    ACUSTICA, 1990, 71 (02): : 152 - 156
  • [47] Cauchy Multichannel Speech Enhancement with a Deep Speech Prior
    Fontaine, Mathieu
    Nugraha, Aditya Arie
    Badeau, Roland
    Yoshii, Kazuyoshi
    Liutkus, Antoine
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [48] Bayesian Multichannel Speech Enhancement with a Deep Speech Prior
    Sekiguchi, Kouhei
    Bando, Yoshiaki
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1233 - 1239
  • [49] Temporal Speech Normalization Methods Comparison in Speech Recognition Using Neural Network
    Salam, Md Sah Bin Hj
    Mohamad, Dzulkifli
    Salleh, Sheikh Hussain Shaikh
    2009 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION, 2009, : 442 - 447
  • [50] Alaryngeal speech enhancement using pattern recognition techniques
    Aguilar, G
    Nakano-Miyatake, M
    Perez-Meana, H
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (07) : 1618 - 1622