Significance of Joint Features Derived from the Modified Group Delay Function in Speech Processing

被引：0

作者：

Rajesh M. Hegde

Hema A. Murthy

V. R. R. Gadde

机构：

[1] University of California San Diego,Department of Electrical and Computer Engineering

[2] Indian Institute of Technology Madras,Department of Computer Science and Engineering

[3] SRI International,STAR Lab

来源：

EURASIP Journal on Audio, Speech, and Music Processing | / 2007卷

关键词：

Acoustics; Speech Recognition; Group Delay; Conventional Group; Resonant Structure;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper investigates the significance of combining cepstral features derived from the modified group delay function and from the short-time spectral magnitude like the MFCC. The conventional group delay function fails to capture the resonant structure and the dynamic range of the speech spectrum primarily due to pitch periodicity effects. The group delay function is modified to suppress these spikes and to restore the dynamic range of the speech spectrum. Cepstral features are derived from the modified group delay function, which are called the modified group delay feature (MODGDF). The complementarity and robustness of the MODGDF when compared to the MFCC are also analyzed using spectral reconstruction techniques. Combination of several spectral magnitude-based features and the MODGDF using feature fusion and likelihood combination is described. These features are then used for three speech processing tasks, namely, syllable, speaker, and language recognition. Results indicate that combining MODGDF with MFCC at the feature level gives significant improvements for speech recognition tasks in noise. Combining the MODGDF and the spectral magnitude-based features gives a significant increase in recognition performance of 11% at best, while combining any two features derived from the spectral magnitude does not give any significant improvement.

引用

共 50 条

[21] Application of the modified group delay function to speaker identification and discrimination
Hegde, RM
Murthy, HA
Rao, GVR
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 517 - +
[22] Music Genre Classification by Fusion of Modified Group Delay and Melodic Features
Rajan, Rajeev
Murthy, Hema A.
2017 TWENTY-THIRD NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2017,
[23] Processing group delay spectrograms for study of formant and harmonic contours in speech signals
Yegnanarayana, B.
Pannala, Vishala
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2024, 156 (04): : 2422 - 2433
[24] Significance of variable height-bandwidth group delay filters in the spectral reconstruction of speech
Arya, Devanshu
Raj, Anant
Hegde, Rajesh M.
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1681 - 1685
[25] FORMANT EXTRACTION FROM GROUP DELAY FUNCTION
MURTHY, HA
YEGNANARAYANA, B
SPEECH COMMUNICATION, 1991, 10 (03) : 209 - 221
[26] ESTIMATION OF POLES AND ZEROS OF VOICED SPEECH USING GROUP DELAY CHARACTERISTICS DERIVED FROM SPECTRAL ENVELOPES.
Mikami, Naoki
Ohba, Ryoji
Electronics and Communications in Japan, Part I: Communications (English translation of Denshi Tsushin Gakkai Ronbunshi), 1986, 69 (03): : 38 - 44
[27] Robust pitch estimation in noisy speech using ZTW and group delay function
Prasad, RaviShankar
Yegnanarayana, B.
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3289 - 3292
[28] Modified Group Delay Based Features for Asthma and HIE Infant Cries Classification
Chittora, Anshu
Patil, Hemant A.
TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 595 - 602
[29] Tonic Pitch Estimation in Turkish Music Using Modified Group Delay Processing
Rajeev, Rajan
Aiswarya, M. A.
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (10) : 6459 - 6474
[30] Determination of instants of significant excitation in speech using hilbert envelope and group delay function
Rao, K. Sreenivasa
Prasanna, S. R. Mahadeva
Yegnanarayana, B.
IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (10) : 762 - 765

← 1 2 3 4 5 →