A Neural Network Architecture for Children's Audio-Visual Emotion Recognition

被引:1
|
作者
Matveev, Anton [1 ]
Matveev, Yuri [1 ]
Frolova, Olga [1 ]
Nikolaev, Aleksandr [1 ]
Lyakso, Elena [1 ]
机构
[1] St Petersburg Univ, Dept Higher Nervous Act & Psychophysiol, Child Speech Res Grp, St Petersburg 199034, Russia
基金
俄罗斯科学基金会;
关键词
audio-visual speech; emotion recognition; children; MULTIMODAL FUSION; SPEECH; AGE;
D O I
10.3390/math11224573
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Detecting and understanding emotions are critical for our daily activities. As emotion recognition (ER) systems develop, we start looking at more difficult cases than just acted adult audio-visual speech. In this work, we investigate the automatic classification of the audio-visual emotional speech of children, which presents several challenges including the lack of publicly available annotated datasets and the low performance of the state-of-the art audio-visual ER systems. In this paper, we present a new corpus of children's audio-visual emotional speech that we collected. Then, we propose a neural network solution that improves the utilization of the temporal relationships between audio and video modalities in the cross-modal fusion for children's audio-visual emotion recognition. We select a state-of-the-art neural network architecture as a baseline and present several modifications focused on a deeper learning of the cross-modal temporal relationships using attention. By conducting experiments with our proposed approach and the selected baseline model, we observe a relative improvement in performance by 2%. Finally, we conclude that focusing more on the cross-modal temporal relationships may be beneficial for building ER systems for child-machine communications and environments where qualified professionals work with children.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition
    Zhang, Shiqing
    Zhang, Shiliang
    Huang, Tiejun
    Gao, Wen
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 281 - 284
  • [2] Audio-visual spontaneous emotion recognition
    Zeng, Zhihong
    Hu, Yuxiao
    Roisman, Glenn I.
    Wen, Zhen
    Fu, Yun
    Huang, Thomas S.
    ARTIFICIAL INTELLIGENCE FOR HUMAN COMPUTING, 2007, 4451 : 72 - +
  • [3] A Deep Neural Network for Audio-Visual Person Recognition
    Alam, Mohammad Rafiqul
    Bennamoun, Mohammed
    Togneri, Roberto
    Sohel, Ferdous
    2015 IEEE 7TH INTERNATIONAL CONFERENCE ON BIOMETRICS THEORY, APPLICATIONS AND SYSTEMS (BTAS 2015), 2015,
  • [4] Audio-Visual Fusion Network Based on Conformer for Multimodal Emotion Recognition
    Guo, Peini
    Chen, Zhengyan
    Li, Yidi
    Liu, Hong
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 315 - 326
  • [5] Audio-Visual Learning for Multimodal Emotion Recognition
    Fan, Siyu
    Jing, Jianan
    Wang, Chongwen
    SYMMETRY-BASEL, 2025, 17 (03):
  • [6] Audio-Visual Attention Networks for Emotion Recognition
    Lee, Jiyoung
    Kim, Sunok
    Kim, Seungryong
    Sohn, Kwanghoon
    AVSU'18: PROCEEDINGS OF THE 2018 WORKSHOP ON AUDIO-VISUAL SCENE UNDERSTANDING FOR IMMERSIVE MULTIMEDIA, 2018, : 27 - 32
  • [7] Deep operational audio-visual emotion recognition
    Akturk, Kaan
    Keceli, Ali Seydi
    NEUROCOMPUTING, 2024, 588
  • [8] Audio-Visual Emotion Recognition in Video Clips
    Noroozi, Fatemeh
    Marjanovic, Marina
    Njegus, Angelina
    Escalera, Sergio
    Anbarjafari, Gholamreza
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (01) : 60 - 75
  • [9] RECURRENT NEURAL NETWORK TRANSDUCER FOR AUDIO-VISUAL SPEECH RECOGNITION
    Makino, Takaki
    Liao, Hank
    Assael, Yannis
    Shillingford, Brendan
    Garcia, Basilio
    Braga, Otavio
    Siohan, Olivier
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 905 - 912
  • [10] Audio-Visual Emotion Recognition Using a Hybrid Deep Convolutional Neural Network based on Census Transform
    Cornejo, Jadisha Yarif Ramirez
    Pedrini, Helio
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3396 - 3402