Predicting Arousal and Valence from Waveforms and Spectrograms using Deep Neural Networks

被引:28
|
作者
Yang, Zixiaofan [1 ]
Hirschberg, Julia [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
关键词
speech emotion recognition; computational paralinguistics; deep learning; EMOTION RECOGNITION; SPEECH; FEATURES;
D O I
10.21437/Interspeech.2018-2397
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic recognition of spontaneous emotion in conversational speech is an important yet challenging problem. In this paper, we propose a deep neural network model to track continuous emotion changes in the arousal-valence two-dimensional space by combining inputs from raw waveform signals and spectrograms, both of which have been shown to be useful in the emotion recognition task. The neural network architecture contains a set of convolutional neural network (CNN) layers and bidirectional long short-term memory (BLSTM) layers to account for both temporal and spectral variation and model contextual content. Experimental results of predicting valence and arousal on the SEMAINE database and the RECOLA database show that the proposed model significantly outperforms model using hand-engineered features, by exploiting waveforms and spectrograms as input. We also compare the effects of waveforms vs. spectrograms and find that waveforms are better at capturing arousal, while spectrograms are better at capturing valence. Moreover, combining information from both inputs provides further improvement to the performance.
引用
收藏
页码:3092 / 3096
页数:5
相关论文
共 50 条
  • [1] Enhanced speech emotion recognition using averaged valence arousal dominance mapping and deep neural networks
    Rizhinashvili, Davit
    Sham, Abdallah Hussein
    Anbarjafari, Gholamreza
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (10) : 7445 - 7454
  • [2] Personal Identification Using Gait Spectrograms and Deep Convolutional Neural Networks
    Jung, Dawoon
    Mau Dung Nguyen
    Arshad, Muhammad Zeeshan
    Kim, Jinwook
    Mun, Kyung-Ryoul
    [J]. 2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 6899 - 6904
  • [3] Automatic Sleep Arousal Identification From Physiological Waveforms Using Deep Learning
    Miller, Daniel
    Ward, Andrew
    Bambos, Nicholas
    [J]. 2018 COMPUTING IN CARDIOLOGY CONFERENCE (CINC), 2018, 45
  • [4] Bioacoustic Classification of Antillean Manatee Vocalization Spectrograms Using Deep Convolutional Neural Networks
    Merchan, Fernando
    Guerra, Ariel
    Poveda, Hector
    Guzman, Hector M.
    Sanchez-Galan, Javier E.
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (09):
  • [5] Emotion Recognition from Speech using Spectrograms and Shallow Neural Networks
    Slimi, Anwer
    Hamroun, Mohamed
    Zrigui, Mounir
    Nicolas, Henri
    [J]. MOMM 2020: THE 18TH INTERNATIONAL CONFERENCE ON ADVANCES IN MOBILE COMPUTING & MULTIMEDIA, 2020, : 35 - 39
  • [6] Predicting the Arousal and Valence Values of Emotional States Using Learned, Predesigned, and Deep Visual Features
    Joudeh, Itaf Omar
    Cretu, Ana-Maria
    Bouchard, Stephane
    [J]. SENSORS, 2024, 24 (13)
  • [7] Predicting Exact Valence and Arousal Values from EEG
    Galvao, Filipe
    Alarcao, Soraia M.
    Fonseca, Manuel J.
    [J]. SENSORS, 2021, 21 (10)
  • [8] Predicting Age with Deep Neural Networks from Polysomnograms
    Brink-Kjaer, Andreas
    Mignot, Emmanuel
    Sorensen, Helge B. D.
    Fennum, Poul
    [J]. 42ND ANNUAL INTERNATIONAL CONFERENCES OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY: ENABLING INNOVATIVE TECHNOLOGIES FOR GLOBAL HEALTHCARE EMBC'20, 2020, : 146 - 149
  • [9] NATIVE LANGUAGE IDENTIFICATION FROM RAW WAVEFORMS USING DEEP CONVOLUTIONAL NEURAL NETWORKS WITH ATTENTIVE POOLING
    Ubale, Rutuja
    Ramanarayanan, Vikram
    Qian, Yao
    Evanini, Keelan
    Leong, Chee Wee
    Lee, Chong Min
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 403 - 410
  • [10] VERY DEEP CONVOLUTIONAL NEURAL NETWORKS FOR RAW WAVEFORMS
    Dai, Wei
    Dai, Chia
    Qu, Shuhui
    Li, Juncheng
    Das, Samarjit
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 421 - 425