A Multi-Modal Emotion Recognition System Based on CNN-Transformer Deep Learning Technique

被引:3
|
作者
Karatay, Busra [1 ]
Bestepe, Deniz
Sailunaz, Kashfia
Ozyer, Tansel
Alhajj, Reda
机构
[1] TOBB Univ Econ & Technol, Dept Comp Engn, Ankara, Turkey
关键词
CNN; Deep Learning; emotion; emotion classification; Transformer;
D O I
10.1109/CDMA54072.2022.00029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion analysis is a subject that researchers from various fields have been working on for a long time. Different emotion detection methods have been developed for text, audio, photography, and video domains. Automated emotion detection methods using machine learning and deep learning models from videos and pictures have been an interesting topic for researchers. In this paper, a deep learning framework, in which CNN and Transformer models are combined, that classifies emotions using facial and body features extracted from videos is proposed. Facial and body features were extracted using OpenPose, and in the data preprocessing stage 2 operations such as new video creation and frame selection were tried. The experiments were conducted on two datasets, FABO and CK+. Our framework outperformed similar deep learning models with 99% classification accuracy for the FABO dataset, and showed remarkable performance over 90% accuracy for most versions of the framework for both the FABO and CK+ dataset.
引用
收藏
页码:145 / 150
页数:6
相关论文
共 50 条
  • [1] A multi-modal deep learning system for Arabic emotion recognition
    Abu Shaqra F.
    Duwairi R.
    Al-Ayyoub M.
    [J]. International Journal of Speech Technology, 2023, 26 (01) : 123 - 139
  • [2] A Multi-Modal Deep Learning Approach for Emotion Recognition
    Shahzad, H. M.
    Bhatti, Sohail Masood
    Jaffar, Arfan
    Rashid, Muhammad
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (02): : 1561 - 1570
  • [3] Multi-Modal Emotion Recognition Based On deep Learning Of EEG And Audio Signals
    Li, Zhongjie
    Zhang, Gaoyan
    Dang, Jianwu
    Wang, Longbiao
    Wei, Jianguo
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] Multi-Modal Emotion Recognition From Speech and Facial Expression Based on Deep Learning
    Cai, Linqin
    Dong, Jiangong
    Wei, Min
    [J]. 2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5726 - 5729
  • [5] Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning
    Liu, Dong
    Wang, Zhiyong
    Wang, Lifeng
    Chen, Longxi
    [J]. FRONTIERS IN NEUROROBOTICS, 2021, 15
  • [6] Multi-modal Emotion Recognition Based on Hypergraph
    Zong, Lin-Lin
    Zhou, Jia-Hui
    Xie, Qiu-Jie
    Zhang, Xian-Chao
    Xu, Bo
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (12): : 2520 - 2534
  • [7] Emotion recognition based on multi-modal physiological signals and transfer learning
    Fu, Zhongzheng
    Zhang, Boning
    He, Xinrun
    Li, Yixuan
    Wang, Haoyuan
    Huang, Jian
    [J]. FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [8] Multi-modal haptic image recognition based on deep learning
    Han, Dong
    Nie, Hong
    Chen, Jinbao
    Chen, Meng
    Deng, Zhen
    Zhang, Jianwei
    [J]. SENSOR REVIEW, 2018, 38 (04) : 486 - 493
  • [9] Hybrid Time Distributed CNN-transformer for Speech Emotion Recognition
    Slimi, Anwer
    Nicolas, Henri
    Zrigui, Mounir
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES (ICSOFT), 2022, : 602 - 611
  • [10] Multi-modal deep learning for landform recognition
    Du, Lin
    You, Xiong
    Li, Ke
    Meng, Liqiu
    Cheng, Gong
    Xiong, Liyang
    Wang, Guangxia
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 158 : 63 - 75