A Neural Network Architecture for Children's Audio-Visual Emotion Recognition

被引：1

作者：

Matveev, Anton ^{[1
]}

Matveev, Yuri ^{[1
]}

Frolova, Olga ^{[1
]}

Nikolaev, Aleksandr ^{[1
]}

Lyakso, Elena ^{[1
]}

机构：

[1] St Petersburg Univ, Dept Higher Nervous Act & Psychophysiol, Child Speech Res Grp, St Petersburg 199034, Russia

来源：

MATHEMATICS | 2023年 / 11卷 / 22期

基金：

俄罗斯科学基金会;

关键词：

audio-visual speech; emotion recognition; children; MULTIMODAL FUSION; SPEECH; AGE;

D O I：

10.3390/math11224573

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Detecting and understanding emotions are critical for our daily activities. As emotion recognition (ER) systems develop, we start looking at more difficult cases than just acted adult audio-visual speech. In this work, we investigate the automatic classification of the audio-visual emotional speech of children, which presents several challenges including the lack of publicly available annotated datasets and the low performance of the state-of-the art audio-visual ER systems. In this paper, we present a new corpus of children's audio-visual emotional speech that we collected. Then, we propose a neural network solution that improves the utilization of the temporal relationships between audio and video modalities in the cross-modal fusion for children's audio-visual emotion recognition. We select a state-of-the-art neural network architecture as a baseline and present several modifications focused on a deeper learning of the cross-modal temporal relationships using attention. By conducting experiments with our proposed approach and the selected baseline model, we observe a relative improvement in performance by 2%. Finally, we conclude that focusing more on the cross-modal temporal relationships may be beneficial for building ER systems for child-machine communications and environments where qualified professionals work with children.

引用

页数：17

共 50 条

[21] Audio-Visual (Multimodal) Speech Recognition System Using Deep Neural Network
Paulin, Hebsibah
Milton, R. S.
JanakiRaman, S.
Chandraprabha, K.
JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 3963 - 3974
[22] Fuzzy-Neural-Network Based Audio-Visual Fusion for Speech Recognition
Wu, Gin-Der
Tsai, Hao-Shu
2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (ICAIIC 2019), 2019, : 210 - 214
[23] An Active Learning Paradigm for Online Audio-Visual Emotion Recognition
Kansizoglou, Ioannis
Bampis, Loukas
Gasteratos, Antonios
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (02) : 756 - 768
[24] Robustness of a chaotic modal neural network applied to audio-visual speech recognition
Kabre, H
NEURAL NETWORKS FOR SIGNAL PROCESSING VII, 1997, : 607 - 616
[25] MANDARIN AUDIO-VISUAL SPEECH RECOGNITION WITH EFFECTS TO THE NOISE AND EMOTION
Pao, Tsang-Long
Liao, Wen-Yuan
Chen, Yu-Te
Wu, Tsan-Nung
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (02): : 711 - 723
[26] Multimodal and Temporal Perception of Audio-visual Cues for Emotion Recognition
Ghaleb, Esam
Popa, Mirela
Asteriadis, Stylianos
2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
[27] RBF neural network mouth tracking for audio-visual speech recognition system
Hui, LE
Seng, KP
Tse, KM
TENCON 2004 - 2004 IEEE REGION 10 CONFERENCE, VOLS A-D, PROCEEDINGS: ANALOG AND DIGITAL TECHNIQUES IN ELECTRICAL ENGINEERING, 2004, : A84 - A87
[28] Semantic audio-visual data fusion for automatic emotion recognition
Datcu, Dragos
Rothkrantz, Leon J. M.
EUROMEDIA '2008, 2008, : 58 - 65
[29] Multimodal Emotion Recognition using Physiological and Audio-Visual Features
Matsuda, Yuki
Fedotov, Dmitrii
Takahashi, Yuta
Arakawa, Yutaka
Yasumo, Keiichi
Minker, Wolfgang
PROCEEDINGS OF THE 2018 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2018 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC'18 ADJUNCT), 2018, : 946 - 951
[30] A PRE-TRAINED AUDIO-VISUAL TRANSFORMER FOR EMOTION RECOGNITION
Minh Tran
Soleymani, Mohammad
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4698 - 4702

← 1 2 3 4 5 →