Speech emotion recognition via graph-based representations

被引:4
|
作者
Pentari, Anastasia [1 ]
Kafentzis, George [2 ]
Tsiknakis, Manolis [1 ,3 ]
机构
[1] Fdn Res & Technol Hellas, Inst Comp Sci, GR-70013 Iraklion, Greece
[2] Univ Crete, Comp Sci Dept, GR-70013 Iraklion, Greece
[3] Hellen Mediterranean Univ, Dept Elect & Comp Engn, Iraklion, Greece
关键词
FEATURES;
D O I
10.1038/s41598-024-52989-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Speech emotion recognition (SER) has gained an increased interest during the last decades as part of enriched affective computing. As a consequence, a variety of engineering approaches have been developed addressing the challenge of the SER problem, exploiting different features, learning algorithms, and datasets. In this paper, we propose the application of the graph theory for classifying emotionally-colored speech signals. Graph theory provides tools for extracting statistical as well as structural information from any time series. We propose to use the mentioned information as a novel feature set. Furthermore, we suggest setting a unique feature-based identity for each emotion belonging to each speaker. The emotion classification is performed by a Random Forest classifier in a Leave-One-Speaker-Out Cross Validation (LOSO-CV) scheme. The proposed method is compared with two state-of-the-art approaches involving well known hand-crafted features as well as deep learning architectures operating on mel-spectrograms. Experimental results on three datasets, EMODB (German, acted) and AESDD (Greek, acted), and DEMoS (Italian, in-the-wild), reveal that our proposed method outperforms the comparative methods in these datasets. Specifically, we observe an average UAR increase of almost 18%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$18\%$$\end{document}, 8%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$8\%$$\end{document} and 13%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$13\%$$\end{document}, respectively.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Speech emotion recognition based on Graph-LSTM neural network
    Li, Yan
    Wang, Yapeng
    Yang, Xu
    Im, Sio-Kei
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [22] Speech emotion recognition based on Graph-LSTM neural network
    Yan Li
    Yapeng Wang
    Xu Yang
    Sio-Kei Im
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [23] Emotion Recognition in Speech with Latent Discriminative Representations Learning
    Han, Jing
    Zhang, Zixing
    Keren, Gil
    Schuller, Bjorn
    [J]. ACTA ACUSTICA UNITED WITH ACUSTICA, 2018, 104 (05) : 737 - 740
  • [24] Evaluating Self-Supervised Speech Representations for Speech Emotion Recognition
    Atmaja, Bagus Tris
    Sasou, Akira
    [J]. IEEE ACCESS, 2022, 10 : 124396 - 124407
  • [25] Analysis of constant-Q filterbank based representations for speech emotion recognition
    Singh, Premjeet
    Waldekar, Shefali
    Sahidullah, Md
    Saha, Goutam
    [J]. DIGITAL SIGNAL PROCESSING, 2022, 130
  • [26] Speech Emotion Recognition Exploiting ASR-based and Phonological Knowledge Representations
    Liang, Shuang
    Xie, Xiang
    Zhan, Qingran
    Cheng, Hao
    [J]. 6TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE, ICIAI2022, 2022, : 216 - 220
  • [27] Similarity learning for graph-based image representations
    de Mauro, C
    Diligenti, M
    Gori, M
    Maggini, M
    [J]. PATTERN RECOGNITION LETTERS, 2003, 24 (08) : 1115 - 1122
  • [28] Complexities of Graph-Based Representations for Elementary Functions
    Nagayama, Shinobu
    Sasao, Tsutomu
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2009, 58 (01) : 106 - 119
  • [29] GCFormer: A Graph Convolutional Transformer for Speech Emotion Recognition
    Gao, Yingxue
    Zhao, Huan
    Xiao, Yufeng
    Zhang, Zixing
    [J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 307 - 313
  • [30] Speech emotion recognition based on emotion perception
    Gang Liu
    Shifang Cai
    Ce Wang
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2023