Multimodal Affective Communication Analysis: Fusing Speech Emotion and Text Sentiment Using Machine Learning

被引:0
|
作者
Resende Faria, Diego [1 ]
Weinberg, Abraham Itzhak [2 ]
Ayrosa, Pedro Paulo [3 ,4 ]
机构
[1] Univ Hertfordshire, Sch Phys Engn & Comp Sci, Hatfield AL10 9AB, England
[2] AI Weinberg AI Experts, IL-90850 Tel Aviv, Israel
[3] Univ Estadual Londrina, LABTED, BR-86057970 Londrina, Brazil
[4] Univ Estadual Londrina, Comp Sci Dept, BR-86057970 Londrina, Brazil
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 15期
关键词
speech emotion recognition; sentiment analysis; affective communication; data fusion; multimodality; machine learning; deep learning; dynamic Bayesian mixture model;
D O I
10.3390/app14156631
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Affective communication, encompassing verbal and non-verbal cues, is crucial for understanding human interactions. This study introduces a novel framework for enhancing emotional understanding by fusing speech emotion recognition (SER) and sentiment analysis (SA). We leverage diverse features and both classical and deep learning models, including Gaussian naive Bayes (GNB), support vector machines (SVMs), random forests (RFs), multilayer perceptron (MLP), and a 1D convolutional neural network (1D-CNN), to accurately discern and categorize emotions in speech. We further extract text sentiment from speech-to-text conversion, analyzing it using pre-trained models like bidirectional encoder representations from transformers (BERT), generative pre-trained transformer 2 (GPT-2), and logistic regression (LR). To improve individual model performance for both SER and SA, we employ an extended dynamic Bayesian mixture model (DBMM) ensemble classifier. Our most significant contribution is the development of a novel two-layered DBMM (2L-DBMM) for multimodal fusion. This model effectively integrates speech emotion and text sentiment, enabling the classification of more nuanced, second-level emotional states. Evaluating our framework on the EmoUERJ (Portuguese) and ESD (English) datasets, the extended DBMM achieves accuracy rates of 96% and 98% for SER, 85% and 95% for SA, and 96% and 98% for combined emotion classification using the 2L-DBMM, respectively. Our findings demonstrate the superior performance of the extended DBMM for individual modalities compared to individual classifiers and the 2L-DBMM for merging different modalities, highlighting the value of ensemble methods and multimodal fusion in affective communication analysis. The results underscore the potential of our approach in enhancing emotional understanding with broad applications in fields like mental health assessment, human-robot interaction, and cross-cultural communication.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Design of text sentiment analysis tool using feature extraction based on fusing machine learning algorithms
    Ajitha, P.
    Sivasangari, A.
    Rajkumar, R. Immanuel
    Poonguzhali, S.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (04) : 6375 - 6383
  • [2] Detection of emotion by text analysis using machine learning
    Machova, Kristina
    Szaboova, Martina
    Paralic, Jan
    Micko, Jan
    [J]. FRONTIERS IN PSYCHOLOGY, 2023, 14
  • [3] MULTIMODAL SPEECH EMOTION RECOGNITION USING AUDIO AND TEXT
    Yoon, Seunghyun
    Byun, Seokhyun
    Jung, Kyomin
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 112 - 118
  • [4] Emotion Detection from Text and Sentiment Analysis of Ukraine Russia War using Machine Learning Technique
    Al Maruf, Abdullah
    Ziyad, Zakaria Masud
    Haque, Md Mahmudul
    Khanam, Fahima
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 868 - 882
  • [5] Scientific Text Sentiment Analysis using Machine Learning Techniques
    Raza, Hassan
    Faizan, M.
    Hamza, Ahsan
    Mushtaq, Ahmed
    Akhtar, Naeem
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (12) : 157 - 165
  • [6] Using Machine Learning for Sentiment and Social Influence Analysis in Text
    Kolog, Emmanuel Awuni
    Montero, Calkin Suero
    Toivonen, Tapani
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY & SYSTEMS (ICITS 2018), 2018, 721 : 453 - 463
  • [7] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Sandeep Kumar Panda
    Ajay Kumar Jena
    Mohit Ranjan Panda
    Susmita Panda
    [J]. Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
  • [8] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Panda, Sandeep Kumar
    Jena, Ajay Kumar
    Panda, Mohit Ranjan
    Panda, Susmita
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
  • [9] Learning deep multimodal affective features for spontaneous speech emotion recognition
    Zhang, Shiqing
    Tao, Xin
    Chuang, Yuelong
    Zhao, Xiaoming
    [J]. SPEECH COMMUNICATION, 2021, 127 : 73 - 81
  • [10] Speech Emotion Recognition Using Machine Learning: A Comparative Analysis
    Nath S.
    Shahi A.K.
    Martin T.
    Choudhury N.
    Mandal R.
    [J]. SN Computer Science, 5 (4)