EMID: An Emotional Aligned Dataset in Audio-Visual Modality

被引:0
|
作者
Zou, Jialing [1 ]
Mei, Jiahao [1 ]
Ye, Guangze [1 ]
Huai, Tianyu [1 ]
Shen, Qiwei [1 ]
Dong, Daoguo [1 ]
机构
[1] East China Normal Univ, Shanghai, Peoples R China
关键词
Music-Image Dataset; Emotional Matching; Cross-modal Alignment;
D O I
10.1145/3607541.3616821
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose Emotionally paired Music and Image Dataset (EMID), a novel dataset designed for the emotional matching of music and images, to facilitate auditory-visual cross-modal tasks such as generation and retrieval. Unlike existing approaches that primarily focus on semantic correlations or roughly divided emotional relations, EMID emphasizes the significance of emotional consistency between music and images using an advanced 13-dimension emotional model. By incorporating emotional alignment into the dataset, it aims to establish pairs that closely align with human perceptual understanding, thereby raising the performance of auditory-visual cross-modal tasks. We also design a supplemental module named EMI-Adapter to optimize existing cross-modal alignment methods. To validate the effectiveness of the EMID, we conduct a psychological experiment, which has demonstrated that considering the emotional relationship between the two modalities effectively improves the accuracy of matching in abstract perspective. This research lays the foundation for future cross-modal research in domains such as psychotherapy and contributes to advancing the understanding and utilization of emotions in cross-modal alignment. The EMID dataset is available at https://github.com/ecnu-aigc/EMID.
引用
收藏
页码:41 / 48
页数:8
相关论文
共 50 条
  • [21] CHEAVD: a Chinese natural emotional audio-visual database
    Li, Ya
    Tao, Jianhua
    Chao, Linlin
    Bao, Wei
    Liu, Yazhu
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2017, 8 (06) : 913 - 924
  • [22] Audio-visual entrainment as a treatment modality for seasonal affective disorder
    Berg, K
    Siever, D
    [J]. APPLIED PSYCHOPHYSIOLOGY AND BIOFEEDBACK, 2001, 26 (03) : 232 - 232
  • [23] Audio-Visual Recognition of Emotional Engagement of People with Dementia
    Steinert, Lars
    Putze, Felix
    Kuster, Dennis
    Schultz, Tanja
    [J]. INTERSPEECH 2021, 2021, : 1024 - 1028
  • [24] BUILDING A CHINESE NATURAL EMOTIONAL AUDIO-VISUAL DATABASE
    Bao, Wei
    Li, Ya
    Gu, Mingliang
    Yang, Minghao
    Li, Hao
    Chao, Linlin
    Tao, Jianhua
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 583 - 587
  • [25] An audio-visual dataset of human–human interactions in stressful situations
    Iulia Lefter
    Gertjan J. Burghouts
    Leon J. M. Rothkrantz
    [J]. Journal on Multimodal User Interfaces, 2014, 8 : 29 - 41
  • [26] GCE: An Audio-Visual Dataset for Group Cohesion and Emotion Analysis
    Lim, Eunchae
    Ho, Ngoc-Huynh
    Pant, Sudarshan
    Kang, Young-Shin
    Jeon, Seong-Eun
    Kim, Seungwon
    Kim, Soo-Hyung
    Yang, Hyung-Jeong
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (15):
  • [27] A CURATED DATASET OF URBAN SCENES FOR AUDIO-VISUAL SCENE ANALYSIS
    Wang, Shanshan
    Mesaros, Annamaria
    Heittola, Toni
    Virtanen, Tuomas
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 626 - 630
  • [28] Modality matters: Testing bilingual irony comprehension in the textual, auditory, and audio-visual modality
    Bromberek-Dyzman, Katarzyna
    Jankowiak, Katarzyna
    Chelminiak, Pawel
    [J]. JOURNAL OF PRAGMATICS, 2021, 180 : 219 - 231
  • [29] An audio-visual distance for audio-visual speech vector quantization
    Girin, L
    Foucher, E
    Feng, G
    [J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
  • [30] Catching audio-visual mice:: The extrapolation of audio-visual speed
    Hofbauer, MM
    Wuerger, SM
    Meyer, GF
    Röhrbein, F
    Schill, K
    Zetzsche, C
    [J]. PERCEPTION, 2003, 32 : 96 - 96