Speech emotion recognition via graph-based representations

被引:4
|
作者
Pentari, Anastasia [1 ]
Kafentzis, George [2 ]
Tsiknakis, Manolis [1 ,3 ]
机构
[1] Fdn Res & Technol Hellas, Inst Comp Sci, GR-70013 Iraklion, Greece
[2] Univ Crete, Comp Sci Dept, GR-70013 Iraklion, Greece
[3] Hellen Mediterranean Univ, Dept Elect & Comp Engn, Iraklion, Greece
关键词
FEATURES;
D O I
10.1038/s41598-024-52989-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Speech emotion recognition (SER) has gained an increased interest during the last decades as part of enriched affective computing. As a consequence, a variety of engineering approaches have been developed addressing the challenge of the SER problem, exploiting different features, learning algorithms, and datasets. In this paper, we propose the application of the graph theory for classifying emotionally-colored speech signals. Graph theory provides tools for extracting statistical as well as structural information from any time series. We propose to use the mentioned information as a novel feature set. Furthermore, we suggest setting a unique feature-based identity for each emotion belonging to each speaker. The emotion classification is performed by a Random Forest classifier in a Leave-One-Speaker-Out Cross Validation (LOSO-CV) scheme. The proposed method is compared with two state-of-the-art approaches involving well known hand-crafted features as well as deep learning architectures operating on mel-spectrograms. Experimental results on three datasets, EMODB (German, acted) and AESDD (Greek, acted), and DEMoS (Italian, in-the-wild), reveal that our proposed method outperforms the comparative methods in these datasets. Specifically, we observe an average UAR increase of almost 18%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$18\%$$\end{document}, 8%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$8\%$$\end{document} and 13%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$13\%$$\end{document}, respectively.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Speech emotion recognition via graph-based representations
    Anastasia Pentari
    George Kafentzis
    Manolis Tsiknakis
    [J]. Scientific Reports, 14
  • [2] Investigating Graph-based Features for Speech Emotion Recognition
    Pentari, Anastasia
    Kafentzis, George
    Tsiknakis, Manolis
    [J]. 2022 IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS (BHI) JOINTLY ORGANISED WITH THE IEEE-EMBS INTERNATIONAL CONFERENCE ON WEARABLE AND IMPLANTABLE BODY SENSOR NETWORKS (BSN'22), 2022,
  • [3] Energy Efficient Graph-Based Hybrid Learning for Speech Emotion Recognition on Humanoid Robot
    Wu, Haowen
    Xu, Hanyue
    Seng, Kah Phooi
    Chen, Jieli
    Ang, Li Minn
    [J]. ELECTRONICS, 2024, 13 (06)
  • [4] Modeling Speech Emotion Recognition via ImageBind representations
    Chakhtouna, Adil
    Sekkate, Sara
    Adib, Abdellah
    [J]. Procedia Computer Science, 2024, 236 : 428 - 435
  • [5] SEMI-SUPERVISED SPEECH RECOGNITION VIA GRAPH-BASED TEMPORAL CLASSIFICATION
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6548 - 6552
  • [6] Graph-Based Representations in Pattern Recognition and Computational Intelligence
    Marfil, R.
    Escolano, F.
    Bandera, A.
    [J]. BIO-INSPIRED SYSTEMS: COMPUTATIONAL AND AMBIENT INTELLIGENCE, PT 1, 2009, 5517 : 399 - +
  • [7] Graph-Based Object Semantic Refinement for Visual Emotion Recognition
    Zhang, Jing
    Liu, Xinyu
    Wang, Zhe
    Yang, Hai
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 3036 - 3049
  • [8] Editorial for the Special Issue on Graph-based representations in pattern recognition
    Torsello, Andrea
    Jiang, Xiaoyi
    Ferrer, Miguel
    [J]. PATTERN RECOGNITION LETTERS, 2012, 33 (15) : 1957 - 1957
  • [9] EEG Emotion Recognition via Graph-based Spatio-Temporal Attention Neural Networks
    Sartipi, Shadi
    Torkamani-Azar, Mastaneh
    Cetin, Mujdat
    [J]. 2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 571 - 574
  • [10] Graph Learning Based Speaker Independent Speech Emotion Recognition
    Xu, Xinzhou
    Huang, Chengwei
    Wu, Chen
    Wang, Qingyun
    Zhao, Li
    [J]. ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2014, 14 (02) : 17 - 22