Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

被引:0
|
作者
Nagase, Ryotaro [1 ]
Fukumori, Takahiro [1 ]
Yamashita, Yoichi [1 ]
机构
[1] Ritsumeikan Univ, Grad Sch Informat Sci & Engn, Kusatsu, Shiga, Japan
关键词
NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the advanced technique of deep learning has improved the performance of speech emotion recognition (SER) as well as speech synthesis and speech recognition. On the other hand, emotion recognition still has low accuracy and does not capture emotional features in detail. Multimodal processing for SER is one of the techniques that improve the performance of SER and can handle integrated emotional factors. Many researchers adopt various fusion methods to produce optimal methods for each case. However, it is insufficient to observe and analyze the respective fusion's synergistic effects in acoustic and linguistic features conveyed by speech. In this paper, we propose a method of SER with acoustic and linguistic features at the utterance level. Firstly, two emotion recognition systems using acoustic or linguistic features are trained with Japanese Twitter-based emotional speech (JTES). Then, we aim to improve accuracy by using early fusion, which fuses linguistic and acoustic features, and late fusion, which fuses the values predicted by each model. Consequently, proposed methods have about a 20% higher accuracy than the method that uses classifiers in only acoustic or linguistic information. Also, several methods improve the recognition rate for each emotion.
引用
收藏
页码:725 / 730
页数:6
相关论文
共 50 条
  • [41] Combined CNN LSTM with attention for speech emotion recognition based on feature-level fusion
    Liu Y.
    Chen A.
    Zhou G.
    Yi J.
    Xiang J.
    Wang Y.
    Multimedia Tools and Applications, 2024, 83 (21) : 59839 - 59859
  • [42] DOMAIN-ADVERSARIAL AUTOENCODER WITH ATTENTION BASED FEATURE LEVEL FUSION FOR SPEECH EMOTION RECOGNITION
    Gao, Yuan
    Liu, JiaXing
    Wang, Longbiao
    Dang, Jianwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6314 - 6318
  • [43] Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion
    Al-onazi, Badriyya B.
    Nauman, Muhammad Asif
    Jahangir, Rashid
    Malik, Muhmmad Mohsin
    Alkhammash, Eman H.
    Elshewey, Ahmed M.
    APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [44] Comparison between Decision-Level and Feature-Level Fusion of Acoustic and Linguistic Features for Spontaneous Emotion Recognition
    Planet, Santiago
    Iriondo, Ignasi
    7TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI 2012), 2012,
  • [45] ANN based Decision Fusion for Speech Emotion Recognition
    Xu, Lu
    Xu, Mingxing
    Yang, Dali
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2003 - +
  • [46] Comparison between Decision-Level and Feature-Level Fusion of Acoustic and Linguistic Features for Spontaneous Emotion Recognition
    Planet, Santiago
    Iriondo, Ignasi
    SISTEMAS Y TECNOLOGIAS DE INFORMACION, VOLS 1 AND 2, 2012, : 199 - 204
  • [47] FUSION APPROACHES FOR EMOTION RECOGNITION FROM SPEECH USING ACOUSTIC AND TEXT-BASED FEATURES
    Pepino, Leonardo
    Riera, Pablo
    Ferrer, Luciana
    Gravano, Agustin
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6484 - 6488
  • [48] Acoustic Emotion Recognition based on Fusion of Multiple Feature-Dependent Deep Boltzmann Machines
    Poon-Feng, Kelvin
    Huang, Dong-Yan
    Dong, Minghui
    Li, Haizhou
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 584 - +
  • [49] Feature representation for speech emotion Recognition
    Abdollahpour, Mehdi
    Zamani, Lafar
    Rad, Hamidreza Saligheh
    2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1465 - 1468
  • [50] Deep fusion framework for speech command recognition using acoustic and linguistic features
    Sunakshi Mehra
    Seba Susan
    Multimedia Tools and Applications, 2023, 82 : 38667 - 38691