A Neural Network Architecture for Children's Audio-Visual Emotion Recognition

被引：1

作者：

Matveev, Anton ^{[1
]}

Matveev, Yuri ^{[1
]}

Frolova, Olga ^{[1
]}

Nikolaev, Aleksandr ^{[1
]}

Lyakso, Elena ^{[1
]}

机构：

[1] St Petersburg Univ, Dept Higher Nervous Act & Psychophysiol, Child Speech Res Grp, St Petersburg 199034, Russia

来源：

MATHEMATICS | 2023年 / 11卷 / 22期

基金：

俄罗斯科学基金会;

关键词：

audio-visual speech; emotion recognition; children; MULTIMODAL FUSION; SPEECH; AGE;

D O I：

10.3390/math11224573

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Detecting and understanding emotions are critical for our daily activities. As emotion recognition (ER) systems develop, we start looking at more difficult cases than just acted adult audio-visual speech. In this work, we investigate the automatic classification of the audio-visual emotional speech of children, which presents several challenges including the lack of publicly available annotated datasets and the low performance of the state-of-the art audio-visual ER systems. In this paper, we present a new corpus of children's audio-visual emotional speech that we collected. Then, we propose a neural network solution that improves the utilization of the temporal relationships between audio and video modalities in the cross-modal fusion for children's audio-visual emotion recognition. We select a state-of-the-art neural network architecture as a baseline and present several modifications focused on a deeper learning of the cross-modal temporal relationships using attention. By conducting experiments with our proposed approach and the selected baseline model, we observe a relative improvement in performance by 2%. Finally, we conclude that focusing more on the cross-modal temporal relationships may be beneficial for building ER systems for child-machine communications and environments where qualified professionals work with children.

引用

页数：17

共 50 条

[31] DISENTANGLEMENT FOR AUDIO-VISUAL EMOTION RECOGNITION USING MULTITASK SETUP
Peri, Raghuveer
Parthasarathy, Srinivas
Bradshaw, Charles
Sundaram, Shiva
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6344 - 6348
[32] ISLA: Temporal Segmentation and Labeling for Audio-Visual Emotion Recognition
Kim, Yelin
Provost, Emily Mower
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (02) : 196 - 208
[33] Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition
Praveen, R. Gnana
Granger, Eric
Cardinal, Patrick
2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
[34] Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition
Lubis, Nurul
Gomez, Randy
Sakti, Sakriani
Nakamura, Keisuke
Yoshino, Koichiro
Nakamura, Satoshi
Nakadai, Kazuhiro
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2180 - 2184
[35] Audio-Visual Glance Network for Efficient Video Recognition
Nugroho, Muhammad Adi
Woo, Sangmin
Lee, Sumin
Kim, Changick
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10116 - 10125
[36] AUDIO-VISUAL SPEECH RECOGNITION WITH A HYBRID CTC/ATTENTION ARCHITECTURE
Petridis, Stavros
Stafylakis, Themos
Ma, Pingchuan
Tzimiropoulos, Georgios
Pantic, Maja
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 513 - 520
[37] An audio-visual speech recognition with a new mandarin audio-visual database
Liao, Wen-Yuan
Pao, Tsang-Long
Chen, Yu-Te
Chang, Tsun-Wei
INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
[38] Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
Wei, Jie
Hu, Guanyu
Yang, Xinyu
Luu, Anh Tuan
Dong, Yizhuo
INTERSPEECH 2022, 2022, : 1988 - 1992
[39] Audio-Visual Emotion Recognition Based on Facial Expression and Affective Speech
Zhang, Shiqing
Li, Lemin
Zhao, Zhijin
MULTIMEDIA AND SIGNAL PROCESSING, 2012, 346 : 46 - +
[40] Leveraging Inter-rater Agreement for Audio-Visual Emotion Recognition
Kim, Yelin
Provost, Emily Mower
2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 553 - 559

← 1 2 3 4 5 →