Integrative interaction of emotional speech in audio-visual modality

被引：2

作者：

Dong, Haibin ^{[1
]}

Li, Na ^{[1
]}

Fan, Lingzhong ^{[2
]}

Wei, Jianguo ^{[1
]}

Xu, Junhai ^{[1
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Brainnetome Ctr, Beijing, Peoples R China

来源：

FRONTIERS IN NEUROSCIENCE | 2022年 / 16卷

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

audio-visual integration; emotional speech; fMRI; left insula; weighted RSA; SUPERIOR TEMPORAL SULCUS; HUMAN BRAIN; PERCEPTION; FACE; INFORMATION; EXPRESSIONS; ACTIVATION; PRECUNEUS; INSULA; VOICE;

D O I：

10.3389/fnins.2022.797277

中图分类号：

Q189 [神经科学];

学科分类号：

071006 ;

摘要：

Emotional clues are always expressed in many ways in our daily life, and the emotional information we receive is often represented by multiple modalities. Successful social interactions require a combination of multisensory cues to accurately determine the emotion of others. The integration mechanism of multimodal emotional information has been widely investigated. Different brain activity measurement methods were used to determine the location of brain regions involved in the audio-visual integration of emotional information, mainly in the bilateral superior temporal regions. However, the methods adopted in these studies are relatively simple, and the materials of the study rarely contain speech information. The integration mechanism of emotional speech in the human brain still needs further examinations. In this paper, a functional magnetic resonance imaging (fMRI) study was conducted using event-related design to explore the audio-visual integration mechanism of emotional speech in the human brain by using dynamic facial expressions and emotional speech to express emotions of different valences. Representational similarity analysis (RSA) based on regions of interest (ROIs), whole brain searchlight analysis, modality conjunction analysis and supra-additive analysis were used to analyze and verify the role of relevant brain regions. Meanwhile, a weighted RSA method was used to evaluate the contributions of each candidate model in the best fitted model of ROIs. The results showed that only the left insula was detected by all methods, suggesting that the left insula played an important role in the audio-visual integration of emotional speech. Whole brain searchlight analysis, modality conjunction analysis and supra-additive analysis together revealed that the bilateral middle temporal gyrus (MTG), right inferior parietal lobule and bilateral precuneus might be involved in the audio-visual integration of emotional speech from other aspects.

引用

页数：13

共 50 条

[1] EMID: An Emotional Aligned Dataset in Audio-Visual Modality
Zou, Jialing
Mei, Jiahao
Ye, Guangze
Huai, Tianyu
Shen, Qiwei
Dong, Daoguo
[J]. PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON MULTIMEDIA CONTENT GENERATION AND EVALUATION, MCGE 2023: New Methods and Practice, 2023, : 41 - 48
[2] Emotional Audio-Visual Speech Synthesis Based on PAD
Jia, Jia
Zhang, Shen
Meng, Fanbo
Wang, Yongxin
Cai, Lianhong
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03): : 570 - 582
[3] A Cantonese Audio-Visual Emotional Speech (CAVES) dataset
Chee Seng Chong
Chris Davis
Jeesun Kim
[J]. Behavior Research Methods, 2024, 56 (5) : 5264 - 5278
[4] An audio-visual distance for audio-visual speech vector quantization
Girin, L
Foucher, E
Feng, G
[J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
[5] Emotional perception of speech sounds under audio-visual presentation
Shigeno, S
[J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2000, 35 (3-4) : 53 - 53
[6] Dense Modality Interaction Network for Audio-Visual Event Localization
Liu, Shuo
Quan, Weize
Wang, Chaoqun
Liu, Yuan
Liu, Bin
Yan, Dong-Ming
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2734 - 2748
[7] MODALITY ATTENTION FOR END-TO-END AUDIO-VISUAL SPEECH RECOGNITION
Zhou, Pan
Yang, Wenwen
Chen, Wei
Wang, Yanfeng
Jia, Jia
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6565 - 6569
[8] Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech
Alm, Magnus
Behne, Dawn
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (04): : 3001 - 3010
[9] An audio-visual speech recognition with a new mandarin audio-visual database
Liao, Wen-Yuan
Pao, Tsang-Long
Chen, Yu-Te
Chang, Tsun-Wei
[J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
[10] Expressive audio-visual speech
Bevacqua, E
Pelachaud, C
[J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2004, 15 (3-4) : 297 - 304

← 1 2 3 4 5 →