Speech Emotion Recognition Using Spectrogram & Phoneme Embedding

被引:109
|
作者
Yenigalla, Promod [1 ]
Kumar, Abhay [1 ]
Tripathi, Suraj [1 ]
Singh, Chirag [1 ]
Kar, Sibsambhu [1 ]
Vepa, Jithendra [1 ]
机构
[1] Samsung R&D Inst India, Bangalore, Karnataka, India
关键词
Spectrogram; phoneme; phoneme embedding; speech emotion recognition; CNN;
D O I
10.21437/Interspeech.2018-1811
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a speech emotion recognition method based on phoneme sequence and spectrogram. Both phoneme sequence and spectrogram retain emotion contents of speech which is missed if the speech is converted into text. We performed various experiments with different kinds of deep neural networks with phoneme and spectrogram as inputs. Three of those network architectures are presented here that helped to achieve better accuracy when compared to the state-of-the-art methods on benchmark dataset. A phoneme and spectrogram combined CNN model proved to be most accurate in recognizing emotions on IEMOCAP data. We achieved more than 4% increase in overall accuracy and average class accuracy as compared to the existing state-of-the-art methods.
引用
收藏
页码:3688 / 3692
页数:5
相关论文
共 50 条
  • [1] Phoneme recognition using speech image (spectrogram)
    Ahmadi, M
    Bailey, NJ
    Hoyle, BS
    [J]. ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 675 - 677
  • [2] Emotion recognition based on AlexNet using speech spectrogram
    Park, Soeun
    Lee, Chul
    Kwon, Soonil
    Park, Neungsoo
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2018, 123 : 49 - 49
  • [3] Detecting Human Emotion via Speech Recognition by Using Speech Spectrogram
    Prasomphan, Sathit
    [J]. PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 113 - 122
  • [4] Speech Emotion Recognition Using Auditory Spectrogram and Cepstral Features
    Zhao, Shujie
    Yang, Yan
    Cohen, Israel
    Zhang, Lijun
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 136 - 140
  • [5] Improvement Of Speech Emotion Recognition with Neural Network Classifier by Using Speech Spectrogram
    Prasomphan, Sathit
    [J]. 2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015), 2015, : 73 - 76
  • [6] Speech Emotion Recognition Using Speech Feature and Word Embedding
    Atmaja, Bagus Tris
    Shirai, Kiyoaki
    Akagi, Masato
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 519 - 523
  • [7] Autoencoder With Emotion Embedding for Speech Emotion Recognition
    Zhang, Chenghao
    Xue, Lei
    [J]. IEEE ACCESS, 2021, 9 : 51231 - 51241
  • [8] Autoencoder with emotion embedding for speech emotion recognition
    Zhang, Chenghao
    Xue, Lei
    [J]. IEEE Access, 2021, 9 : 51231 - 51241
  • [9] Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN
    Enriquez, Marc Dominic
    Lucas, Crisron Rudolf
    Aquino, Angelina
    [J]. 2023 34TH IRISH SIGNALS AND SYSTEMS CONFERENCE, ISSC, 2023,
  • [10] Experimental Analysis and Selection of Spectrogram Features for Speech Emotion Recognition
    Tang, Gui-Chen
    Liang, Rui-Yu
    Feng, Yue-Qin
    Wang, Qing-Yun
    [J]. INTERNATIONAL CONFERENCE ON MECHANICS, BUILDING MATERIAL AND CIVIL ENGINEERING (MBMCE 2015), 2015, : 757 - 762