Cepstral representation of speech motivated by time-frequency masking: An application to speech recognition

被引：12

作者：

Aikawa, K ^{[1
]}

Singer, H ^{[1
]}

Kawahara, H ^{[1
]}

Tohkura, Y ^{[1
]}

机构：

[1] ATR,INTERPRETING TELECOMMUN RES LABS,SEIKA,KYOTO 61902,JAPAN

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 1996年 / 100卷 / 01期

关键词：

D O I：

10.1121/1.415961

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A new spectral representation incorporating time-frequency forward masking is proposed. This masked spectral representation is efficiently represented by a quefrency domain parameter called dynamic-cepstrum (DyC). Automatic speech recognition experiments have demonstrated that DyC powerfully improves performance in phoneme classification and phrase recognition. This new spectral representation simulates a perceived spectrum. It enhances formant transition, which provides relevant cues for phoneme perception, while suppressing temporally stationary spectral properties, such as the effect of microphone frequency characteristics or the speaker-dependent time-invariant spectral feature. These features are advantageous for speaker-independent speech recognition. DyC can efficiently represent both the instantaneous and transitional aspects of a running spectrum with a vector of the same size as a conventional cepstrum. DyC is calculated from a cepstrum time sequence using a matrix Lifter. Each column vector of the matrix lifter performs spectral smoothing. Smoothing characteristics are a function of the time interval between a masker and a signal. DyC outperformed a conventional cepstrum parameter obtained through linear predictive coding (LPC) analysis for both phoneme classification and phrase recognition by using hidden Markov models (HMMs). Compared with speaker-dependent recognition, an even greater improvement over the cepstrum parameter was found in speaker-independent speech recognition. Furthermore, DyC with only 16 coefficients exhibited higher speech recognition performance than a combination of the cepstrum and a delta-cepstrum with 32 coefficients for the classification experiment of phonemes contaminated by noises. (C) 1996 Acoustical Society of America.

引用

页码：603 / 614

页数：12

共 50 条

[1] Time-frequency representation based cepstral processing for speech recognition
Fineberg, AB
Yu, KC
[J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 25 - 28
[2] Time-Frequency Masking For Large Scale Robust Speech Recognition
Wang, Yuxuan
Misra, Ananya
Chine, Kean K.
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2469 - 2473
[3] On time-frequency masking in voiced speech
Skoglund, J
Kleijn, WB
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 361 - 369
[4] Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition
Soni, Meet
Panda, Ashish
[J]. INTERSPEECH 2019, 2019, : 426 - 430
[5] On the integration of time-frequency masking speech separation and recognition in underdetermined environments
Jafari, Ingrid
Haque, Serajul
Togneri, Roberto
Nordholm, Sven
[J]. 2012 CONFERENCE RECORD OF THE FORTY SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2012, : 1613 - 1617
[6] Weighting Time-Frequency Representation of Speech using Auditory Saliency for Automatic Speech Recognition
Cong-Thanh Do
Stylianou, Yannis
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1591 - 1595
[7] Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions
Dorothea Kolossa
Ramon Fernandez Astudillo
Eugen Hoffmann
Reinhold Orglmeister
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2010
[8] Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions
Kolossa, Dorothea
Astudillo, Ramon Fernandez
Hoffmann, Eugen
Orglmeister, Reinhold
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2010,
[9] Maximizing environmental sound recognition and speech intelligibility using time-frequency masking
Johnson, Eric M.
Healy, Eric W.
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
[10] Label-Driven Time-Frequency Masking for Robust Speech Command Recognition
Soni, Meet
Sheikh, Imran
Kopparapu, Sunil Kumar
[J]. TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 341 - 351

← 1 2 3 4 5 →