Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

被引:0
|
作者
Nagase, Ryotaro [1 ]
Fukumori, Takahiro [1 ]
Yamashita, Yoichi [1 ]
机构
[1] Ritsumeikan Univ, Grad Sch Informat Sci & Engn, Kusatsu, Shiga, Japan
关键词
NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the advanced technique of deep learning has improved the performance of speech emotion recognition (SER) as well as speech synthesis and speech recognition. On the other hand, emotion recognition still has low accuracy and does not capture emotional features in detail. Multimodal processing for SER is one of the techniques that improve the performance of SER and can handle integrated emotional factors. Many researchers adopt various fusion methods to produce optimal methods for each case. However, it is insufficient to observe and analyze the respective fusion's synergistic effects in acoustic and linguistic features conveyed by speech. In this paper, we propose a method of SER with acoustic and linguistic features at the utterance level. Firstly, two emotion recognition systems using acoustic or linguistic features are trained with Japanese Twitter-based emotional speech (JTES). Then, we aim to improve accuracy by using early fusion, which fuses linguistic and acoustic features, and late fusion, which fuses the values predicted by each model. Consequently, proposed methods have about a 20% higher accuracy than the method that uses classifiers in only acoustic or linguistic information. Also, several methods improve the recognition rate for each emotion.
引用
收藏
页码:725 / 730
页数:6
相关论文
共 50 条
  • [21] Acoustic feature analysis and optimization for Bangla speech emotion recognition
    Sultana, Sadia
    Rahman, Mohammad Shahidur
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2023, 44 (03) : 157 - 166
  • [22] Acoustic feature selection for automatic emotion recognition from speech
    Rong, Jia
    Li, Gang
    Chen, Yi-Ping Phoebe
    INFORMATION PROCESSING & MANAGEMENT, 2009, 45 (03) : 315 - 328
  • [23] A Feature Fusion Model with Data Augmentation for Speech Emotion Recognition
    Tu, Zhongwen
    Liu, Bin
    Zhao, Wei
    Yan, Raoxin
    Zou, Yang
    APPLIED SCIENCES-BASEL, 2023, 13 (07):
  • [24] Multimodal Emotion Recognition Based on Feature Fusion
    Xu, Yurui
    Wu, Xiao
    Su, Hang
    Liu, Xiaorui
    2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 7 - 11
  • [25] Speech emotion recognition based on multi-feature and multi-lingual fusion
    Wang, Chunyi
    Ren, Ying
    Zhang, Na
    Cui, Fuwei
    Luo, Shiying
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 4897 - 4907
  • [26] Graph-Based Multi-Feature Fusion Method for Speech Emotion Recognition
    Liu, Xueyu
    Lin, Jie
    Wang, Chao
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (16)
  • [27] Metric Learning Based Feature Representation with Gated Fusion Model for Speech Emotion Recognition
    Gao, Yuan
    Liu, JiaXing
    Wang, Longbiao
    Dang, Jianwu
    INTERSPEECH 2021, 2021, : 4503 - 4507
  • [28] Novel feature fusion method for speech emotion recognition based on multiple kernel learning
    Zhao, L. (zhaoli@seu.edu.cn), 1600, Southeast University (29):
  • [29] Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition
    Schuller, B
    Villar, RJ
    Rigoll, G
    Lang, M
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 325 - 328
  • [30] DEEP ENCODED LINGUISTIC AND ACOUSTIC CUES FOR ATTENTION BASED END TO END SPEECH EMOTION RECOGNITION
    Bhosale, Swapnil
    Chakraborty, Rupayan
    Kopparapu, Sunil Kumar
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7189 - 7193