Cepstral representation of speech motivated by time-frequency masking: An application to speech recognition

被引:12
|
作者
Aikawa, K [1 ]
Singer, H [1 ]
Kawahara, H [1 ]
Tohkura, Y [1 ]
机构
[1] ATR,INTERPRETING TELECOMMUN RES LABS,SEIKA,KYOTO 61902,JAPAN
来源
关键词
D O I
10.1121/1.415961
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A new spectral representation incorporating time-frequency forward masking is proposed. This masked spectral representation is efficiently represented by a quefrency domain parameter called dynamic-cepstrum (DyC). Automatic speech recognition experiments have demonstrated that DyC powerfully improves performance in phoneme classification and phrase recognition. This new spectral representation simulates a perceived spectrum. It enhances formant transition, which provides relevant cues for phoneme perception, while suppressing temporally stationary spectral properties, such as the effect of microphone frequency characteristics or the speaker-dependent time-invariant spectral feature. These features are advantageous for speaker-independent speech recognition. DyC can efficiently represent both the instantaneous and transitional aspects of a running spectrum with a vector of the same size as a conventional cepstrum. DyC is calculated from a cepstrum time sequence using a matrix Lifter. Each column vector of the matrix lifter performs spectral smoothing. Smoothing characteristics are a function of the time interval between a masker and a signal. DyC outperformed a conventional cepstrum parameter obtained through linear predictive coding (LPC) analysis for both phoneme classification and phrase recognition by using hidden Markov models (HMMs). Compared with speaker-dependent recognition, an even greater improvement over the cepstrum parameter was found in speaker-independent speech recognition. Furthermore, DyC with only 16 coefficients exhibited higher speech recognition performance than a combination of the cepstrum and a delta-cepstrum with 32 coefficients for the classification experiment of phonemes contaminated by noises. (C) 1996 Acoustical Society of America.
引用
收藏
页码:603 / 614
页数:12
相关论文
共 50 条
  • [1] Time-frequency representation based cepstral processing for speech recognition
    Fineberg, AB
    Yu, KC
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 25 - 28
  • [2] Time-Frequency Masking For Large Scale Robust Speech Recognition
    Wang, Yuxuan
    Misra, Ananya
    Chine, Kean K.
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2469 - 2473
  • [3] On time-frequency masking in voiced speech
    Skoglund, J
    Kleijn, WB
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 361 - 369
  • [4] Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition
    Soni, Meet
    Panda, Ashish
    [J]. INTERSPEECH 2019, 2019, : 426 - 430
  • [5] On the integration of time-frequency masking speech separation and recognition in underdetermined environments
    Jafari, Ingrid
    Haque, Serajul
    Togneri, Roberto
    Nordholm, Sven
    [J]. 2012 CONFERENCE RECORD OF THE FORTY SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2012, : 1613 - 1617
  • [6] Weighting Time-Frequency Representation of Speech using Auditory Saliency for Automatic Speech Recognition
    Cong-Thanh Do
    Stylianou, Yannis
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1591 - 1595
  • [7] Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions
    Dorothea Kolossa
    Ramon Fernandez Astudillo
    Eugen Hoffmann
    Reinhold Orglmeister
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2010
  • [8] Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions
    Kolossa, Dorothea
    Astudillo, Ramon Fernandez
    Hoffmann, Eugen
    Orglmeister, Reinhold
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2010,
  • [9] Maximizing environmental sound recognition and speech intelligibility using time-frequency masking
    Johnson, Eric M.
    Healy, Eric W.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [10] Label-Driven Time-Frequency Masking for Robust Speech Command Recognition
    Soni, Meet
    Sheikh, Imran
    Kopparapu, Sunil Kumar
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 341 - 351