Speech-Signal-Based Frequency Warping

被引:13
|
作者
Paliwal, Kuldip [1 ]
Shannon, Benjamin [1 ]
Lyons, James [1 ]
Wojcicki, Kamil [1 ]
机构
[1] Griffith Univ, Signal Proc Lab, Nathan, Qld 4111, Australia
关键词
Bark scale; mel scale; robust automatic speech recognition (ASR); speech-signal-based frequency cepstral coefficient (SFCC); speech-signal-based frequency warping;
D O I
10.1109/LSP.2009.2014096
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The speech signal is used for transmission of linguistic information. High energy portions of the speech spectrum have higher signal-to-noise ratios than the low energy portions. As a result, these regions are more robust to noise. Since the speech signal is known to be very robust to noise, it is expected that the high energy regions of the speech spectrum carry the majority of the linguistic information. This letter tries to derive a frequency warping function directly from the speech signal by sampling the frequency axis nonuniformly with the high energy regions sampled more densely than the low energy regions. To achieve this, an ensemble average short-time power spectrum is computed from a large speech corpus. The speech-signal-based frequency warping is obtained by considering equal area portions of the log spectrum. The proposed frequency warping is shown to be similar to the frequency scales obtained through psycho-acoustic experiments, namely the mel and bark scales. The warping is then used in filterbank design for automatic speech recognition experiments. The results of these experiments show that cepstral features based on the proposed warping achieve performance under clean conditions comparable to that of mel-frequency cepstral coefficients, while outperforming them under noisy conditions.
引用
收藏
页码:319 / 322
页数:4
相关论文
共 50 条
  • [21] Frequency warping approach for vocal tract length normalization in speech recognition
    Xu, W
    Wang, BX
    Ding, Q
    PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION SCIENCE AND TECHNOLOGY, VOL 2, 2004, : 494 - 499
  • [22] A permutation algorithm based on dynamic time warping in speech frequency-domain blind source separation
    Lv, Zhao
    Zhang, Bei-bei
    Wu, Xiao-pei
    Zhang, Chao
    Zhou, Bang-yan
    SPEECH COMMUNICATION, 2017, 92 : 132 - 141
  • [23] Chirplet transform based time frequency analysis of speech signal for automated speech emotion recognition
    Mishra, Siba Prasad
    Warule, Pankaj
    Deb, Suman
    SPEECH COMMUNICATION, 2023, 155
  • [24] Classification of Speech Signal based on Feature Fusion in Time and Frequency Domain
    Kristomo, Domy
    Nugroho, Fx Henry
    2021 4TH INTERNATIONAL SEMINAR ON RESEARCH OF INFORMATION TECHNOLOGY AND INTELLIGENT SYSTEMS (ISRITI 2021), 2020,
  • [25] Robust speech recognition based on spectral adjusting and warping
    Zhao, R
    Wang, Z
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 553 - 556
  • [26] Warping functions in speech
    Umesh, S
    Cohen, L
    Nelson, D
    WAVELET APPLICATIONS IN SIGNAL AND IMAGE PROCESSING VI, 1998, 3458 : 194 - 209
  • [27] Fundamental frequency feature warping for frequency normalization and data augmentation in child automatic speech recognition
    Yeung, Gary
    Fan, Ruchao
    Alwan, Abeer
    SPEECH COMMUNICATION, 2021, 135 : 1 - 10
  • [28] Preprocessing and Segmentation of the Speech Signal in the Frequency Domain for Speech Recognition
    A. S. Kolokolov
    Automation and Remote Control, 2003, 64 : 985 - 994
  • [29] Preprocessing and segmentation of the speech signal in the frequency domain for speech recognition
    Kolokolov, AS
    AUTOMATION AND REMOTE CONTROL, 2003, 64 (06) : 985 - 994
  • [30] Quadraatic function based frequency warping method
    Wang Honghai
    Liu Gang
    Guo Jun
    PROCEEDINGS OF THE 26TH CHINESE CONTROL CONFERENCE, VOL 4, 2007, : 530 - +