A Cross-Domain Exploration of Audio and Textual Data for Multi-Modal Emotion Detection

被引:0
|
作者
Haque, Mohd Ariful [1 ]
George, Roy [1 ]
Rifat, Rakib Hossain [2 ]
Uddin, Md Shihab [3 ]
Kamal, Marufa [3 ]
Gupta, Kishor Datta [1 ]
机构
[1] Clark Atlanta Univ, Atlanta, GA 30314 USA
[2] BRAC Univ, Dhaka, Bangladesh
[3] Comilla Univ, Cumilla, Bangladesh
关键词
Emotion Detection; Bi-LSTM; distilroberta base; Ensemble Methods; Multi-Modal Emotion Detection;
D O I
10.1145/3652037.3663943
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The field of sentiment and emotion analysis is a challenging problem that has received research attention. The complexity of emotion and sentiment recognition draws from variability in expression, cultural and individual differences, context dependency, etc. This work takes an exploratory approach to the problem by performing an extensive classification of emotion using machine learning (ML) applied to textual and auditory data sources. We create a pipeline that facilitates the examination of textual and auditory inputs, resulting in more reliable emotional classification. The study uses multiple audio and textual datasets for the prediction of four distinct emotions. A four-layer Bi-LSTM model achieved 95% accuracy in emotion analysis from auditory clips. The training set contained 2391 samples, with Angry (20%), Fearful (18%), Happy (38%), and Neutral (24%). In the validation set of 713 samples, emotions were similarly distributed. The test set had 312 samples, with percentages of emotions comparable to the training set. We merged four datasets for textual analysis and utilized the "emotion english distilroberta base" model [5], achieving 90% accuracy on the test data. In the training set, emotions were distributed as follows: Angry (25%), Fearful (23%), Happy (23%), and Neutral (29%). The validation set comprised 305 samples, with similar distributions across emotions. The test set consisted of 712 samples, with percentages of emotions similar to the training set. We develop an application that combines both classifications to obtain a robust classification of arbitrary audio tracks.
引用
收藏
页码:375 / 381
页数:7
相关论文
共 50 条
  • [1] Improving Cross-domain, Cross-lingual and Multi-modal Deception Detection
    Panda, Subhadarshi
    Levitan, Sarah Ita
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 383 - 390
  • [2] Embracing Domain Differences in Fake News: Cross-domain Fake News Detection using Multi-modal Data
    Silva, Amila
    Luo, Ling
    Karunasekera, Shanika
    Leckie, Christopher
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 557 - 565
  • [3] A privacy-preserving framework with multi-modal data for cross-domain recommendation
    Wang, Li
    Sang, Lei
    Zhang, Quangui
    Wu, Qiang
    Xu, Min
    [J]. Knowledge-Based Systems, 2024, 304
  • [4] Multi-modal Instance Refinement for Cross-Domain Action Recognition
    Qing, Yuan
    Wu, Naixing
    Wan, Shaohua
    Duan, Lixin
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 284 - 296
  • [5] Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval
    Fang, Xiang
    Liu, Daizong
    Zhou, Pan
    Hu, Yuchong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7517 - 7532
  • [6] A Multi-Domain and Multi-Modal Representation Disentangler for Cross-Domain Image Manipulation and Classification
    Yang, Fu-En
    Chang, Jing-Cheng
    Tsai, Chung-Chi
    Wang, Yu-Chiang Frank
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 2795 - 2807
  • [7] Multi-Modal Emotion Recognition Fusing Video and Audio
    Xu, Chao
    Du, Pufeng
    Feng, Zhiyong
    Meng, Zhaopeng
    Cao, Tianyi
    Dong, Caichao
    [J]. APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462
  • [8] MmAP : Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning
    Xin, Yi
    Du, Junlong
    Wang, Qiang
    Yan, Ke
    Ding, Shouhong
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16076 - 16084
  • [9] Methods of Multi-Modal Data Exploration
    Grosup, Tomas
    [J]. ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 34 - 37
  • [10] Implementation of Multi-modal Speech Emotion Recognition Using Text Data and Audio Signals
    Adesola, Falade
    Adeyinka, Omirinlewo
    Kayode, Akindeji
    Ayodele, Adebiyi
    [J]. 2023 International Conference on Science, Engineering and Business for Sustainable Development Goals, SEB-SDG 2023, 2023,