Graph-Based Multi-Feature Fusion Method for Speech Emotion Recognition

被引:0
|
作者
Liu, Xueyu [1 ]
Lin, Jie [1 ]
Wang, Chao [1 ]
机构
[1] School of Economics and Management, Tongji University, No. 1, Zhangwu Road, Shanghai,200090, China
基金
中国国家自然科学基金;
关键词
Data fusion - Feature extraction - One dimensional - Signal encoding - Speech enhancement;
D O I
10.1142/S021800142450023X
中图分类号
学科分类号
摘要
Exploring proper way to conduct multi-speech feature fusion for cross-corpus speech emotion recognition is crucial as different audio features could provide complementary cues reflecting human emotion status. Speech emotion recognition allows computers to analyze the specific emotional condition of the speaker through speech, which is of great significance to the development of human-computer interaction technology. While most previous approaches only extract a single speech feature for emotion recognition, existing fusion methods such as concatenation, parallel connection, and splicing ignore heterogeneous patterns in the interaction between features and features, resulting in performance of existing systems. In this paper, we propose a novel graph-based fusion method to explicitly model the relationships between every pair of audio features, which provides a new research idea for speech feature fusion. Specifically, we propose a multi-dimensional edge features learning strategy called graph-based multi-feature fusion method for speech emotion recognition. It represents each speech feature as a node and learns multi-dimensional edge features to explicitly describe the relationship between each feature-feature pair in the context of emotion recognition. This way, the learned multi-dimensional edge features encode speech feature-level information from both the vertex and edge dimensions. Our approach consists of three modules: an Audio Feature Generation (AFG) module, an Audio-Feature Multi-dimensional Edge Feature (AMEF) module and a Speech Emotion Recognition (SER) module. The proposed methodology yielded satisfactory outcomes on the SEWA dataset. Furthermore, the method demonstrated enhanced performance compared to the baseline in the AVEC 2019 Workshop and Challenge. We used data from two cultures as our training and validation sets: two cultures containing German and Hungarian on the SEWA dataset, the CCC scores for German are improved by 17.28% for arousal and 7.93% for liking, and for Hungarian, the CCC scores are improved by 11.15% for arousal and 131.11% for valence. The outcomes of our methodology demonstrate a 13% improvement over alternative fusion techniques, including those employing one-dimensional edge-based feature fusion approach. The experiments on some parts of the Aff-Wild 2 dataset demonstrate that our approach exhibits a certain degree of generalizability and robustness. Code is available at https://github.com/ChaosWang666/Graph-based-multi-Feature-fusion-method. © 2024 World Scientific Publishing Company.
引用
收藏
相关论文
共 50 条
  • [1] Multi-feature Fusion Speech Emotion Recognition Based on SVM
    Zeng, Xiaoping
    Dong, Li
    Chen, Guanghui
    Dong, Qi
    [J]. PROCEEDINGS OF 2020 IEEE 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2020), 2020, : 77 - 80
  • [2] Speech emotion recognition based on multi-feature and multi-lingual fusion
    Wang, Chunyi
    Ren, Ying
    Zhang, Na
    Cui, Fuwei
    Luo, Shiying
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 4897 - 4907
  • [3] A Multi-Feature Fusion Speech Emotion Recognition Method Based on Frequency Band Division and Improved Residual Network
    Guo, Yi
    Zhou, Yongping
    Xiong, Xuejun
    Jiang, Xin
    Tian, Hanbing
    Zhang, Qianxue
    [J]. IEEE ACCESS, 2023, 11 : 86013 - 86024
  • [4] MULTI-FEATURE FUSION EMOTION RECOGNITION BASED ON RESTING EEG
    Zhang, Jun-An
    Gu, Liping
    Chen, Yongqiang
    Zhu, Geng
    Ou, Lang
    Wang, Liyan
    Li, Xiaoou
    Zhong, Lichang
    [J]. JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2022, 22 (03)
  • [5] Speech emotion recognition based on multi-feature speed rate and LSTM
    Yang, Zijun
    Li, Zhen
    Zhou, Shi
    Zhang, Lifeng
    Serikawa, Seiichi
    [J]. NEUROCOMPUTING, 2024, 601
  • [6] Multi-feature graph-based object tracking
    Taj, Murtaza
    Maggio, Emilio
    Cavallaro, Andrea
    [J]. MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2007, 4122 : 190 - 199
  • [7] Speech emotion recognition based on multi‐feature and multi‐lingual fusion
    Chunyi Wang
    Ying Ren
    Na Zhang
    Fuwei Cui
    Shiying Luo
    [J]. Multimedia Tools and Applications, 2022, 81 : 4897 - 4907
  • [8] A Multi-Feature Multi-Classifier System for Speech Emotion Recognition
    Li, Pengcheng
    Song, Yan
    Wang, Peisen
    Dai, Lirong
    [J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [9] Chinese Address Recognition Method Based on Multi-Feature Fusion
    Wang, Yansong
    Wang, Meng
    Ding, Chaoling
    Yang, Xinghua
    Chen, Jian
    [J]. IEEE ACCESS, 2022, 10 : 108905 - 108913
  • [10] An Effective Method for Cirrhosis Recognition Based on Multi-Feature Fusion
    Chen, Yameng
    Sun, Gengxin
    Lei, Yiming
    Zhang, Jinpeng
    [J]. NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615