Parameterization of Dominant Spectral Peak Trajectory for Whisper Speech Recognition

被引:0
|
作者
Feng, Chang [1 ]
Wu, Xiaolong [2 ]
Xu, Mingxing [1 ]
Zheng, Thomas Fang [1 ]
机构
[1] Tsinghua Univ, Ctr Speech & Language Technol, Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China
[2] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi, Xinjiang, Peoples R China
关键词
AUTOMATIC SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic speech recognition (ASR) systems trained on normal speech generally suffer from performance degradations for whisper speech. To solve this problem, this paper concentrates on utilizing similar factors between normal and whisper speech to construct a whisper speech recognizer with normal speech data. We propose to parameterize the dominant spectral peak trajectory (Ppeak) to capture the similarities and concatenate it to the traditional Mel-Frequency Cepstral Coefficients (MFCC) and Human Factor Cepstral Coefficients (HFCC), respectively, to form new features. The proposed features benefit to the accuracy of whisper speech recognition. Performance improvement can be further achieved when the similarity is enhanced by removing low frequency information. Experimental results show that the performance degradation between match and mismatch scenarios was reduced relatively by 90.31% in Word Error Rate (WER) for HFCC after similarity enhancement at a cut-off frequency of 500Hz. Furthermore, we ultimately achieved a relative reduction of 69.60% in WER in the mismatch scenario compared with conventional MFCC even without whisper speech data for training.
引用
收藏
页码:911 / 916
页数:6
相关论文
共 50 条
  • [1] The whisper test and speech recognition tests
    Dick, Finlay
    [J]. OCCUPATIONAL MEDICINE-OXFORD, 2018, 68 (07): : 488 - 489
  • [2] Comparative evaluation of speech parameterization for speech recognition
    Mporas, Iosif
    Ganchev, Todor
    Siafarikas, Mihalis
    Kostoulas, Theodoros
    [J]. 19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL II, PROCEEDINGS, 2007, : 510 - 513
  • [3] Search organization in the whisper continuous speech recognition system
    Alleva, F
    [J]. 1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 295 - 302
  • [4] Spectral peak-weighted liftering of cepstral coefficients for speech recognition
    Kim, HK
    Lee, HS
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2000, E83D (07) : 1540 - 1549
  • [5] Robust speech recognition by using spectral subtraction with noise peak shifting
    Dai, Peng
    Soon, Ing Yann
    [J]. IET SIGNAL PROCESSING, 2013, 7 (08) : 684 - 692
  • [6] Whisper Speech Enhancement Using Joint Variational Autoencoder for Improved Speech Recognition
    Agrawal, Vikas
    Kumar, Shashi
    Rath, Shakti P.
    [J]. INTERSPEECH 2021, 2021, : 2706 - 2710
  • [7] Transfer Learning Using Whisper for Dysarthric Automatic Speech Recognition
    Rathod, Siddharth
    Charola, Monil
    Patil, Hemant A.
    [J]. SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 579 - 589
  • [8] Parameterization of speech signals for robust voice recognition
    Zouhir, Youssef
    Ouni, Kais
    [J]. 2014 INTERNATIONAL CONFERENCE ON ELECTRICAL SCIENCES AND TECHNOLOGIES IN MAGHREB (CISTEM), 2014,
  • [9] Generative Modeling of Pseudo-Whisper for Robust Whispered Speech Recognition
    Ghaffarzadegan, Shabnam
    Boril, Hynek
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (10) : 1705 - 1720
  • [10] HMM/ANN based spectral peak location estimation for noise robust speech recognition
    Ikbal, S
    Bourlard, H
    Magimai-Doss, M
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 453 - 456