Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

被引:0
|
作者
Nagase, Ryotaro [1 ]
Fukumori, Takahiro [1 ]
Yamashita, Yoichi [1 ]
机构
[1] Ritsumeikan Univ, Grad Sch Informat Sci & Engn, Kusatsu, Shiga, Japan
关键词
NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the advanced technique of deep learning has improved the performance of speech emotion recognition (SER) as well as speech synthesis and speech recognition. On the other hand, emotion recognition still has low accuracy and does not capture emotional features in detail. Multimodal processing for SER is one of the techniques that improve the performance of SER and can handle integrated emotional factors. Many researchers adopt various fusion methods to produce optimal methods for each case. However, it is insufficient to observe and analyze the respective fusion's synergistic effects in acoustic and linguistic features conveyed by speech. In this paper, we propose a method of SER with acoustic and linguistic features at the utterance level. Firstly, two emotion recognition systems using acoustic or linguistic features are trained with Japanese Twitter-based emotional speech (JTES). Then, we aim to improve accuracy by using early fusion, which fuses linguistic and acoustic features, and late fusion, which fuses the values predicted by each model. Consequently, proposed methods have about a 20% higher accuracy than the method that uses classifiers in only acoustic or linguistic information. Also, several methods improve the recognition rate for each emotion.
引用
收藏
页码:725 / 730
页数:6
相关论文
共 50 条
  • [31] Emotion Classification in Children's Speech Using Fusion of Acoustic and Linguistic Features
    Polzehl, Tim
    Sundaram, Shiva
    Ketabdar, Hamed
    Wagner, Michael
    Metze, Florian
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 340 - +
  • [32] Research on Feature Fusion Speech Emotion Recognition Technology for Smart Teaching
    Zhang, Shaoyun
    Li, Chao
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [33] An Investigation of a Feature-Level Fusion for Noisy Speech Emotion Recognition
    Sekkate, Sara
    Khalil, Mohammed
    Adib, Abdellah
    Ben Jebara, Sofia
    COMPUTERS, 2019, 8 (04)
  • [34] Speech emotion recognition based on time domain feature
    Zhao, Lasheng
    Wei, Xiaopeng
    Zhang, Qiang
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE INFORMATION COMPUTING AND AUTOMATION, VOLS 1-3, 2008, : 1319 - 1321
  • [35] Speech Emotion Recognition Based on Acoustic Segment Model
    Zheng, Siyuan
    Du, Jun
    Zhou, Hengshun
    Bai, Xue
    Lee, Chin-Hui
    Li, Shipeng
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [36] Adaptive Wavelet Packet Filter-Bank Based Acoustic Feature for Speech Emotion Recognition
    Li, Yue
    Zhang, Guobao
    Huang, Yongming
    PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 359 - 366
  • [37] Speech emotion recognition based on multi-dimensional feature extraction and multi-scale feature fusion
    Yu, Lingli
    Xu, Fengjun
    Qu, Yundong
    Zhou, Kaijun
    APPLIED ACOUSTICS, 2024, 216
  • [38] Improving speech emotion recognition based on acoustic words emotion dictionary
    Wei, Wang
    Cao, Xinyi
    Li, He
    Shen, Lingjie
    Feng, Yaqin
    Watters, Paul A.
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (06) : 747 - 761
  • [39] A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces
    Tamulevicius, Gintautas
    Korvel, Grazina
    Yayak, Anil Bora
    Treigys, Povilas
    Bernataviciene, Jolita
    Kostek, Bozna
    ELECTRONICS, 2020, 9 (10) : 1 - 13
  • [40] An algorithm study for speech emotion recognition based speech feature analysis
    Zhengbiao, Ji
    Feng, Zhou
    Ming, Zhu
    International Journal of Multimedia and Ubiquitous Engineering, 2015, 10 (11): : 33 - 42