Integrative interaction of emotional speech in audio-visual modality

被引:2
|
作者
Dong, Haibin [1 ]
Li, Na [1 ]
Fan, Lingzhong [2 ]
Wei, Jianguo [1 ]
Xu, Junhai [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Brainnetome Ctr, Beijing, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
audio-visual integration; emotional speech; fMRI; left insula; weighted RSA; SUPERIOR TEMPORAL SULCUS; HUMAN BRAIN; PERCEPTION; FACE; INFORMATION; EXPRESSIONS; ACTIVATION; PRECUNEUS; INSULA; VOICE;
D O I
10.3389/fnins.2022.797277
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Emotional clues are always expressed in many ways in our daily life, and the emotional information we receive is often represented by multiple modalities. Successful social interactions require a combination of multisensory cues to accurately determine the emotion of others. The integration mechanism of multimodal emotional information has been widely investigated. Different brain activity measurement methods were used to determine the location of brain regions involved in the audio-visual integration of emotional information, mainly in the bilateral superior temporal regions. However, the methods adopted in these studies are relatively simple, and the materials of the study rarely contain speech information. The integration mechanism of emotional speech in the human brain still needs further examinations. In this paper, a functional magnetic resonance imaging (fMRI) study was conducted using event-related design to explore the audio-visual integration mechanism of emotional speech in the human brain by using dynamic facial expressions and emotional speech to express emotions of different valences. Representational similarity analysis (RSA) based on regions of interest (ROIs), whole brain searchlight analysis, modality conjunction analysis and supra-additive analysis were used to analyze and verify the role of relevant brain regions. Meanwhile, a weighted RSA method was used to evaluate the contributions of each candidate model in the best fitted model of ROIs. The results showed that only the left insula was detected by all methods, suggesting that the left insula played an important role in the audio-visual integration of emotional speech. Whole brain searchlight analysis, modality conjunction analysis and supra-additive analysis together revealed that the bilateral middle temporal gyrus (MTG), right inferior parietal lobule and bilateral precuneus might be involved in the audio-visual integration of emotional speech from other aspects.
引用
收藏
页数:13
相关论文
共 50 条
  • [11] Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
    Yang, Karren
    Markovic, Dejan
    Krenn, Steven
    Agrawal, Vasu
    Richard, Alexander
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8217 - 8227
  • [12] Effects of aging on audio-visual speech integration Effects of aging on audio-visual speech integration
    Huyse, Aurelie
    Leybaert, Jacqueline
    Berthommier, Frederic
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (04): : 1918 - 1931
  • [13] THE VERA AM MITTAG GERMAN AUDIO-VISUAL EMOTIONAL SPEECH DATABASE
    Grimm, Michael
    Kroschel, Kristian
    Narayanan, Shrikanth
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 865 - +
  • [14] Author Correction: A Cantonese Audio-Visual Emotional Speech (CAVES) dataset
    Chee Seng Chong
    Chris Davis
    Jeesun Kim
    [J]. Behavior Research Methods, 2024, 56 (6) : 6410 - 6410
  • [15] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [16] An audio-visual speech recognition system for testing new audio-visual databases
    Pao, Tsang-Long
    Liao, Wen-Yuan
    [J]. VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
  • [17] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
    Zhang, Zi-Qiang
    Zhang, Jie
    Zhang, Jian-Shu
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
  • [18] Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
    Sodoyer, D
    Schwartz, JL
    Girin, L
    Klinkisch, J
    Jutten, C
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1165 - 1173
  • [19] Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
    David Sodoyer
    Jean-Luc Schwartz
    Laurent Girin
    Jacob Klinkisch
    Christian Jutten
    [J]. EURASIP Journal on Advances in Signal Processing, 2002
  • [20] A Turkish Audio-Visual Emotional Database
    Onder, Onur
    Zhalehpour, Sara
    Erdem, Cigdem Eroglu
    [J]. 2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,