Significance of Joint Features Derived from the Modified Group Delay Function in Speech Processing

被引:0
|
作者
Rajesh M. Hegde
Hema A. Murthy
V. R. R. Gadde
机构
[1] University of California San Diego,Department of Electrical and Computer Engineering
[2] Indian Institute of Technology Madras,Department of Computer Science and Engineering
[3] SRI International,STAR Lab
关键词
Acoustics; Speech Recognition; Group Delay; Conventional Group; Resonant Structure;
D O I
暂无
中图分类号
学科分类号
摘要
This paper investigates the significance of combining cepstral features derived from the modified group delay function and from the short-time spectral magnitude like the MFCC. The conventional group delay function fails to capture the resonant structure and the dynamic range of the speech spectrum primarily due to pitch periodicity effects. The group delay function is modified to suppress these spikes and to restore the dynamic range of the speech spectrum. Cepstral features are derived from the modified group delay function, which are called the modified group delay feature (MODGDF). The complementarity and robustness of the MODGDF when compared to the MFCC are also analyzed using spectral reconstruction techniques. Combination of several spectral magnitude-based features and the MODGDF using feature fusion and likelihood combination is described. These features are then used for three speech processing tasks, namely, syllable, speaker, and language recognition. Results indicate that combining MODGDF with MFCC at the feature level gives significant improvements for speech recognition tasks in noise. Combining the MODGDF and the spectral magnitude-based features gives a significant increase in recognition performance of 11% at best, while combining any two features derived from the spectral magnitude does not give any significant improvement.
引用
收藏
相关论文
共 50 条
  • [21] Application of the modified group delay function to speaker identification and discrimination
    Hegde, RM
    Murthy, HA
    Rao, GVR
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 517 - +
  • [22] Music Genre Classification by Fusion of Modified Group Delay and Melodic Features
    Rajan, Rajeev
    Murthy, Hema A.
    2017 TWENTY-THIRD NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2017,
  • [23] Processing group delay spectrograms for study of formant and harmonic contours in speech signals
    Yegnanarayana, B.
    Pannala, Vishala
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2024, 156 (04): : 2422 - 2433
  • [24] Significance of variable height-bandwidth group delay filters in the spectral reconstruction of speech
    Arya, Devanshu
    Raj, Anant
    Hegde, Rajesh M.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1681 - 1685
  • [25] FORMANT EXTRACTION FROM GROUP DELAY FUNCTION
    MURTHY, HA
    YEGNANARAYANA, B
    SPEECH COMMUNICATION, 1991, 10 (03) : 209 - 221
  • [26] ESTIMATION OF POLES AND ZEROS OF VOICED SPEECH USING GROUP DELAY CHARACTERISTICS DERIVED FROM SPECTRAL ENVELOPES.
    Mikami, Naoki
    Ohba, Ryoji
    Electronics and Communications in Japan, Part I: Communications (English translation of Denshi Tsushin Gakkai Ronbunshi), 1986, 69 (03): : 38 - 44
  • [27] Robust pitch estimation in noisy speech using ZTW and group delay function
    Prasad, RaviShankar
    Yegnanarayana, B.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3289 - 3292
  • [28] Modified Group Delay Based Features for Asthma and HIE Infant Cries Classification
    Chittora, Anshu
    Patil, Hemant A.
    TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 595 - 602
  • [29] Tonic Pitch Estimation in Turkish Music Using Modified Group Delay Processing
    Rajeev, Rajan
    Aiswarya, M. A.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (10) : 6459 - 6474
  • [30] Determination of instants of significant excitation in speech using hilbert envelope and group delay function
    Rao, K. Sreenivasa
    Prasanna, S. R. Mahadeva
    Yegnanarayana, B.
    IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (10) : 762 - 765