Speech Emotion Recognition Using Spectrogram & Phoneme Embedding

被引:109
|
作者
Yenigalla, Promod [1 ]
Kumar, Abhay [1 ]
Tripathi, Suraj [1 ]
Singh, Chirag [1 ]
Kar, Sibsambhu [1 ]
Vepa, Jithendra [1 ]
机构
[1] Samsung R&D Inst India, Bangalore, Karnataka, India
关键词
Spectrogram; phoneme; phoneme embedding; speech emotion recognition; CNN;
D O I
10.21437/Interspeech.2018-1811
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a speech emotion recognition method based on phoneme sequence and spectrogram. Both phoneme sequence and spectrogram retain emotion contents of speech which is missed if the speech is converted into text. We performed various experiments with different kinds of deep neural networks with phoneme and spectrogram as inputs. Three of those network architectures are presented here that helped to achieve better accuracy when compared to the state-of-the-art methods on benchmark dataset. A phoneme and spectrogram combined CNN model proved to be most accurate in recognizing emotions on IEMOCAP data. We achieved more than 4% increase in overall accuracy and average class accuracy as compared to the existing state-of-the-art methods.
引用
收藏
页码:3688 / 3692
页数:5
相关论文
共 50 条
  • [41] MelTrans: Mel-Spectrogram Relationship-Learning for Speech Emotion Recognition via Transformers
    Li, Hui
    Li, Jiawen
    Liu, Hai
    Liu, Tingting
    Chen, Qiang
    You, Xinge
    [J]. SENSORS, 2024, 24 (17)
  • [42] REVERBERANT SPEECH RECOGNITION: A PHONEME ANALYSIS
    Parada, Pablo Peso
    Sharma, Dushyant
    Naylor, Patrick A.
    van Waterschoot, Toon
    [J]. 2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 567 - 571
  • [43] Speech emotion recognition based on optimized deep features of dual-channel complementary spectrogram
    Li, Juan
    Zhang, Xueying
    Li, Fenglian
    Huang, Lixia
    [J]. INFORMATION SCIENCES, 2023, 649
  • [44] End-to-end Triplet Loss based Emotion Embedding System for Speech Emotion Recognition
    Kumar, Puneet
    Jain, Sidharth
    Raman, Balasubramanian
    Roy, Partha Pratim
    Iwamura, Masakazu
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8766 - 8773
  • [45] Waveform-Domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition
    Shi, Hao
    Mimura, Masato
    Kawahara, Tatsuya
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3049 - 3060
  • [46] The Gamma MLP for speech phoneme recognition
    Lawrence, S
    Tsoi, AC
    Back, AD
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 785 - 791
  • [47] Emotion Recognition from Text Stories Using an Emotion Embedding model
    Park, Seo-Hui
    Bae, Byung-Chull
    Cheong, Yun-Gyung
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 579 - 583
  • [48] Automated Vocal Emotion Recognition Using Phoneme Class Specific Features
    Kiss, Geza
    van Santen, Jan
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1161 - 1164
  • [49] Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence q
    Liu, Zhen-Tao
    Rehman, Abdul
    Wu, Min
    Cao, Wei-Hua
    Hao, Man
    [J]. INFORMATION SCIENCES, 2021, 563 : 309 - 325
  • [50] Speech emotion recognition using emotion perception spectral feature
    Jiang, Lin
    Tan, Ping
    Yang, Junfeng
    Liu, Xingbao
    Wang, Chao
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (11):