Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals

被引:15
|
作者
Pravena D. [1 ]
Govind D. [1 ]
机构
[1] Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Amrita University, Coimbatore
来源
Govind, D. (d_govind@cb.amrita.edu) | 1600年 / Springer Science and Business Media, LLC卷 / 20期
关键词
Emotion recognition; Excitation source parameters; Strength of excitation; Zero frequency filtering;
D O I
10.1007/s10772-017-9445-x
中图分类号
学科分类号
摘要
The work presented in this paper explores the effectiveness of incorporating the excitation source parameters such as strength of excitation and instantaneous fundamental frequency (F0) for emotion recognition task from speech and electroglottographic (EGG) signals. The strength of excitation (SoE) is an important parameter indicating the pressure with which glottis closes at the glottal closure instants (GCIs). The SoE is computed by the popular zero frequency filtering (ZFF) method which accurately estimates the glottal signal characteristics by attenuating or removing the high frequency vocaltract interactions in speech. The arbitrary impulse sequence, obtained from the estimated GCIs, is used to derive the instantaneous F0. The SoE and the instantaneous F0 parameters are combined with the conventional mel frequency cepstral coefficients (MFCC) to improve the recognition rates of distinct emotions (Anger, Happy and Sad) using Gaussian mixture models as classifier. The performances of the proposed combination of SoE and instantaneous F0 and their dynamic features with MFCC coefficients are compared with the emotion utterances (4 emotions and neutral) from classical German full blown emotion speech database (EmoDb) having simultaneous speech and EGG signals and Surrey Audio Visual Expressed Emotion database (3 emotions and neutral) for both speaker dependent and speaker independent emotion recognition scenarios. To reinforce the effectiveness of the proposed features and for better statistical consistency of the emotion analysis, a fairly large emotion speech database of 220 utterances per emotion in Tamil language with simultaneous EGG recordings, is used in addition to EmoDb. The effectiveness of SoE and instantaneous F0 in characterizing different emotions is also confirmed by the improved emotion recognition performance in Tamil speech-EGG emotion database. © 2017, Springer Science+Business Media, LLC.
引用
收藏
页码:787 / 797
页数:10
相关论文
共 50 条
  • [1] Emotion Recognition from Speech Signals using Excitation Source and Spectral Features
    Choudhury, Akash Roy
    Ghosh, Anik
    Pandey, Rahul
    Barman, Subhas
    [J]. PROCEEDINGS OF 2018 IEEE APPLIED SIGNAL PROCESSING CONFERENCE (ASPCON), 2018, : 257 - 261
  • [2] Exploring the Significance of Low Frequency Regions in Electroglottographic Signals for Emotion Recognition
    Ajay, S. G.
    Pravena, D.
    Govind, D.
    Pradeep, D.
    [J]. ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2018, 678 : 319 - 327
  • [3] Analysis of Excitation Source Features of Speech for Emotion Recognition
    Kadiri, Sudarsana Reddy
    Gangamohan, P.
    Gangashetty, Suryakanth V.
    Yegnanarayana, B.
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1324 - 1328
  • [4] Emotion recognition from Madarin speech signals
    Pao, TL
    Chen, YT
    Yeh, JH
    [J]. 2004 INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2004, : 301 - 304
  • [5] Emotion recognition and evaluation from Mandarin speech signals
    Pao, Tsanglong
    Chen, Yute
    Yeh, Junheng
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2008, 4 (07): : 1695 - 1709
  • [6] Improving Automatic Emotion Recognition from Speech Signals
    Bozkurt, Elif
    Erzin, Engin
    Erdem, Cigdem Eroglu
    Erdem, A. Tanju
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 312 - +
  • [7] The amalgamation of wavelet packet information gain entropy tuned source and system parameters for improved speech emotion recognition
    Palo, Hemanta Kumar
    Subudhiray, Swapna
    Das, Niva
    [J]. SPEECH COMMUNICATION, 2023, 149 : 11 - 28
  • [8] Emotion Recognition from Semi Natural Speech Using Artificial Neural Networks and Excitation Source Features
    Koolagudi, Shashidhar G.
    Devliyal, Swati
    Barthwal, Anurag
    Rao, K. Sreenivasa
    [J]. CONTEMPORARY COMPUTING, 2012, 306 : 273 - +
  • [9] AUTOMATIC GLOTTAL INVERSE FILTERING FROM SPEECH AND ELECTROGLOTTOGRAPHIC SIGNALS
    VEENEMAN, DE
    BEMENT, SL
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02): : 369 - 377
  • [10] Significance of Phonological Features in Speech Emotion Recognition
    Wei Wang
    Paul A. Watters
    Xinyi Cao
    Lingjie Shen
    Bo Li
    [J]. International Journal of Speech Technology, 2020, 23 : 633 - 642