Speech-Signal-Based Frequency Warping

被引:13
|
作者
Paliwal, Kuldip [1 ]
Shannon, Benjamin [1 ]
Lyons, James [1 ]
Wojcicki, Kamil [1 ]
机构
[1] Griffith Univ, Signal Proc Lab, Nathan, Qld 4111, Australia
关键词
Bark scale; mel scale; robust automatic speech recognition (ASR); speech-signal-based frequency cepstral coefficient (SFCC); speech-signal-based frequency warping;
D O I
10.1109/LSP.2009.2014096
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The speech signal is used for transmission of linguistic information. High energy portions of the speech spectrum have higher signal-to-noise ratios than the low energy portions. As a result, these regions are more robust to noise. Since the speech signal is known to be very robust to noise, it is expected that the high energy regions of the speech spectrum carry the majority of the linguistic information. This letter tries to derive a frequency warping function directly from the speech signal by sampling the frequency axis nonuniformly with the high energy regions sampled more densely than the low energy regions. To achieve this, an ensemble average short-time power spectrum is computed from a large speech corpus. The speech-signal-based frequency warping is obtained by considering equal area portions of the log spectrum. The proposed frequency warping is shown to be similar to the frequency scales obtained through psycho-acoustic experiments, namely the mel and bark scales. The warping is then used in filterbank design for automatic speech recognition experiments. The results of these experiments show that cepstral features based on the proposed warping achieve performance under clean conditions comparable to that of mel-frequency cepstral coefficients, while outperforming them under noisy conditions.
引用
收藏
页码:319 / 322
页数:4
相关论文
共 50 条
  • [31] Voice Conversion Based on Weighted Frequency Warping
    Erro, Daniel
    Moreno, Asuncion
    Bonafonte, Antonio
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 922 - 931
  • [32] LSP weighting functions based on spectral sensitivity and mel-frequency warping for speech recognition in digital communication
    Choi, SH
    Kim, HK
    Lee, HS
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 401 - 404
  • [33] Frequency Warping Based on Mapping Formant Parameters
    Shuang, Zhi-Wei
    Bakis, Raimo
    Shechtman, Slava
    Chazan, Dan
    Qin, Yong
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2290 - 2293
  • [34] Enhanced block motion estimation based on frequency warping
    Akbulut, Orhan
    Urhan, Oguzhan
    Ertuerk, Sarp
    2007 IEEE 15TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1-3, 2007, : 1035 - 1037
  • [35] Multi-parameter Frequency Warping Based On LDA
    Liu, Gang
    Chu, Boce
    Fan, Ruchao
    2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, : 2710 - 2714
  • [36] Frequency Warping Based On Two Factor Weighted Fusion
    Liu, Gang
    Chen, Haobin
    Fan, Ruchao
    2017 4TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2017, : 1082 - 1086
  • [37] Correlation-based Frequency Warping for Voice Conversion
    Tian, Xiaohai
    Wu, Zhizheng
    Lee, S. W.
    Chng, Eng Siong
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 211 - +
  • [38] Effects of upper-frequency boundary and spectral warping on speech intelligibility in electrical stimulation
    Goupell, Matthew J.
    Laback, Bernhard
    Majdak, Piotr
    Baumgartner, Wolf-Dieter
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (04): : 2295 - 2309
  • [39] Effects of upper-frequency boundary and spectral warping on speech intelligibility in electrical stimulation
    Goupell, Matthew J.
    Laback, Bernhard
    Majdak, Piotr
    Baumgartner, Wolf-Dieter
    Journal of the Acoustical Society of America, 2008, 123 (04): : 2295 - 2309
  • [40] SPARSE REPRESENTATION FOR FREQUENCY WARPING BASED VOICE CONVERSION
    Tian, Xiaohai
    Wu, Zhizheng
    Lee, Siu Wa
    Nguyen Quy Hy
    Chng, Eng Siong
    Dong, Minghui
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4235 - 4239