Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects

被引:21
|
作者
Zhang, Shiqing [1 ]
Yang, Yijiao [1 ]
Chen, Chen [1 ]
Zhang, Xingnan [1 ]
Leng, Qingming [2 ]
Zhao, Xiaoming [1 ]
机构
[1] Taizhou Univ, Inst Intelligent Informat Proc, Taizhou 318000, Zhejiang, Peoples R China
[2] Jiujiang Univ, Sch Elect & Informat Engn, Jiujiang 332005, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Multimodal emotion recognition; Deep learning; Feature extraction; Multimodal information fusion; review; FACIAL EXPRESSION RECOGNITION; INFORMATION FUSION; AFFECTIVE FEATURES; SENTIMENT ANALYSIS; NEURAL-NETWORKS; SPEECH; DATABASES; MODEL; DIMENSIONALITY; SIGNALS;
D O I
10.1016/j.eswa.2023.121692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition has recently attracted extensive interest due to its significant applications to human-computer interaction. The expression of human emotion depends on various verbal and non-verbal languages like audio, visual, text, etc. Emotion recognition is thus well suited as a multimodal rather than single-modal learning problem. Owing to the powerful feature learning capability, extensive deep learning methods have been recently leveraged to capture high-level emotional feature representations for multimodal emotion recognition (MER). Therefore, this paper makes the first effort in comprehensively summarize recent advances in deep learning-based multimodal emotion recognition (DL-MER) involved in audio, visual, and text modalities. We focus on: (1) MER milestones are given to summarize the development tendency of MER, and conventional multimodal emotional datasets are provided; (2) The core principles of typical deep learning models and its recent advancements are overviewed; (3) A systematic survey and taxonomy is provided to cover the state-of-theart methods related to two key steps in a MER system, including feature extraction and multimodal information fusion; (4) The research challenges and open issues in this field are discussed, and promising future directions are given.
引用
收藏
页数:23
相关论文
共 48 条
  • [1] Metric Learning-Based Multimodal Audio-Visual Emotion Recognition
    Ghaleb, Esam
    Popa, Mirela
    Asteriadis, Stylianos
    IEEE MULTIMEDIA, 2020, 27 (01) : 37 - 48
  • [2] Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities
    Middya A.I.
    Nag B.
    Roy S.
    Knowledge-Based Systems, 2022, 244
  • [3] Deep learning based multimodal emotion recognition using model-level fusion of audio-visual modalities
    Middya, Asif Iqbal
    Nag, Baibhav
    Roy, Sarbani
    KNOWLEDGE-BASED SYSTEMS, 2022, 244
  • [4] A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face
    Lian, Hailun
    Lu, Cheng
    Li, Sunan
    Zhao, Yan
    Tang, Chuangao
    Zong, Yuan
    ENTROPY, 2023, 25 (10)
  • [5] Deep learning based multimodal emotion recognition using model-level fusion of audio-visual modalities
    Middya, Asif Iqbal
    Nag, Baibhav
    Roy, Sarbani
    KNOWLEDGE-BASED SYSTEMS, 2022, 244
  • [6] Multimodal Emotion Recognition with Deep Learning: Advancements, challenges, and future directions
    Geetha, A., V
    Mala, T.
    Priyanka, D.
    Uma, E.
    INFORMATION FUSION, 2024, 105
  • [7] Leveraging recent advances in deep learning for audio-Visual emotion recognition
    Schoneveld, Liam
    Othmani, Alice
    Abdelkawy, Hazem
    PATTERN RECOGNITION LETTERS, 2021, 146 : 1 - 7
  • [8] Integrating audio and visual modalities for multimodal personality trait recognition via hybrid deep learning
    Zhao, Xiaoming
    Liao, Yuehui
    Tang, Zhiwei
    Xu, Yicheng
    Tao, Xin
    Wang, Dandan
    Wang, Guoyu
    Lu, Hongsheng
    FRONTIERS IN NEUROSCIENCE, 2023, 16
  • [9] Deep Learning in Smart Grid Technology: A Review of Recent Advancements and Future Prospects
    Massaoudi, Mohamed
    Abu-Rub, Haitham
    Refaat, Shady S.
    Chihi, Ines
    Oueslati, Fakhreddine S.
    IEEE ACCESS, 2021, 9 : 54558 - 54578
  • [10] Deep Learning Based Audio-Visual Emotion Recognition in a Smart Learning Environment
    Ivleva, Natalja
    Pentel, Avar
    Dunajeva, Olga
    Justsenko, Valeria
    TOWARDS A HYBRID, FLEXIBLE AND SOCIALLY ENGAGED HIGHER EDUCATION, VOL 1, ICL 2023, 2024, 899 : 420 - 431