Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects

被引:21
|
作者
Zhang, Shiqing [1 ]
Yang, Yijiao [1 ]
Chen, Chen [1 ]
Zhang, Xingnan [1 ]
Leng, Qingming [2 ]
Zhao, Xiaoming [1 ]
机构
[1] Taizhou Univ, Inst Intelligent Informat Proc, Taizhou 318000, Zhejiang, Peoples R China
[2] Jiujiang Univ, Sch Elect & Informat Engn, Jiujiang 332005, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Multimodal emotion recognition; Deep learning; Feature extraction; Multimodal information fusion; review; FACIAL EXPRESSION RECOGNITION; INFORMATION FUSION; AFFECTIVE FEATURES; SENTIMENT ANALYSIS; NEURAL-NETWORKS; SPEECH; DATABASES; MODEL; DIMENSIONALITY; SIGNALS;
D O I
10.1016/j.eswa.2023.121692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition has recently attracted extensive interest due to its significant applications to human-computer interaction. The expression of human emotion depends on various verbal and non-verbal languages like audio, visual, text, etc. Emotion recognition is thus well suited as a multimodal rather than single-modal learning problem. Owing to the powerful feature learning capability, extensive deep learning methods have been recently leveraged to capture high-level emotional feature representations for multimodal emotion recognition (MER). Therefore, this paper makes the first effort in comprehensively summarize recent advances in deep learning-based multimodal emotion recognition (DL-MER) involved in audio, visual, and text modalities. We focus on: (1) MER milestones are given to summarize the development tendency of MER, and conventional multimodal emotional datasets are provided; (2) The core principles of typical deep learning models and its recent advancements are overviewed; (3) A systematic survey and taxonomy is provided to cover the state-of-theart methods related to two key steps in a MER system, including feature extraction and multimodal information fusion; (4) The research challenges and open issues in this field are discussed, and promising future directions are given.
引用
收藏
页数:23
相关论文
共 48 条
  • [31] A systematic literature review of deep learning-based text summarization: Techniques, input representation, training strategies, mechanisms, datasets, evaluation, and challenges
    Saleh, Marwa E.
    Wazery, Yaser M.
    Ali, Abdelmgeid A.
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
  • [32] DREAM: Deep Learning-Based Recognition of Emotions From Multiple Affective Modalities Using Consumer-Grade Body Sensors and Video Cameras
    Sharma, Aditi
    Kumar, Akshi
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 1434 - 1442
  • [33] Machine Learning-Based Automated Diagnostic Systems Developed for Heart Failure Prediction Using Different Types of Data Modalities: A Systematic Review and Future Directions
    Javeed, Ashir
    Khan, Shafqat Ullah
    Ali, Liaqat
    Ali, Sardar
    Imrana, Yakubu
    Rahman, Atiqur
    Computational and Mathematical Methods in Medicine, 2022, 2022
  • [34] A multimodal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videos
    Qi, Shubao
    Liu, Baolin
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 1493 - 1503
  • [35] A multimodal fusion-based deep learning framework combined with local-global contextual TCNs for continuous emotion recognition from videos
    Shi, Congbao
    Zhang, Yuanyuan
    Liu, Baolin
    APPLIED INTELLIGENCE, 2024, 54 (04) : 3040 - 3057
  • [36] A multimodal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videos
    Shubao Qi
    Baolin Liu
    Pattern Analysis and Applications, 2023, 26 (3) : 1493 - 1503
  • [37] A multimodal fusion-based deep learning framework combined with local-global contextual TCNs for continuous emotion recognition from videos
    Congbao Shi
    Yuanyuan Zhang
    Baolin Liu
    Applied Intelligence, 2024, 54 : 3040 - 3057
  • [38] Framework for Deep Learning-Based Language Models Using Multi-Task Learning in Natural Language Understanding: A Systematic Literature Review and Future Directions
    Samant, Rahul Manohar
    Bachute, Mrinal R.
    Gite, Shilpa
    Kotecha, Ketan
    IEEE ACCESS, 2022, 10 : 17078 - 17097
  • [39] A systematic review of deep learning-based cervical cytology screening: from cell identification to whole slide image analysis
    Jiang, Peng
    Li, Xuekong
    Shen, Hui
    Chen, Yuqi
    Wang, Lang
    Chen, Hua
    Feng, Jing
    Liu, Juan
    ARTIFICIAL INTELLIGENCE REVIEW, 2023,
  • [40] A systematic review of deep learning-based cervical cytology screening: from cell identification to whole slide image analysis
    Jiang, Peng
    Li, Xuekong
    Shen, Hui
    Chen, Yuqi
    Wang, Lang
    Chen, Hua
    Feng, Jing
    Liu, Juan
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (03) : S2687 - S2758