Exploiting Evidential Theory in the Fusion of Textual, Audio, and Visual Modalities for Affective Music Video Retrieval

被引:0
|
作者
Nemati, Shahla [1 ]
Naghsh-Nilchi, Ahmad Reza [2 ]
机构
[1] Shahrekord Univ, Dept Comp Engn, Fac Engn, Shahrekord, Iran
[2] Univ Isfahan, Fac Comp Engn, Dept Artifital Intelligent, Esfahan, Iran
关键词
Affective music video retrieval; Lexicon-based sentiment analysis; Information fusion; Emotion detection; SENTIMENT ANALYSIS; FRAMEWORK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Developing techniques to retrieve video contents with regard to their impact on viewers' emotions is the main goal of affective video retrieval systems. Existing systems mainly apply a multimodal approach that fuses information from different modalities to specify the affect category. In this paper, the effect of exploiting two types of textual information to enrich the audio-visual content of music video is evaluated; subtitles or songs' lyrics and texts obtained from viewers' comments in video sharing websites. In order to specify the emotional content of texts, an unsupervised lexicon-based method is applied. This method does not need any human-coded corpus for training and is much faster than supervised approach. In order to integrate these modalities, a new information fusion method is proposed based on the Dempster-Shafer theory of evidence. Experiments are conducted on the video clips of DEAP dataset and their associated viewers' comments on YouTube. Results show that incorporating songs' lyrics with the audio-visual content has no positive effect on the retrieval performance, whereas exploiting viewers' comments significantly improves the affective retrieval system. This could be justified by the fact that viewers' affective responses depend not only on the video itself but also on its context.
引用
收藏
页码:222 / 228
页数:7
相关论文
共 50 条
  • [21] A fusion scheme of visual and auditory modalities for event detection in sports video
    Xu, M
    Duan, LY
    Xu, CS
    Tian, Q
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING SIGNAL, PROCESSING EDUCATION, 2003, : 189 - 192
  • [22] Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities
    Butko, Taras
    Canton-Ferrer, Cristian
    Segura, Carlos
    Giro, Xavier
    Nadeu, Climent
    Hernando, Javier
    Casas, Josep R.
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2011,
  • [23] Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities
    Taras Butko
    Cristian Canton-Ferrer
    Carlos Segura
    Xavier Giró
    Climent Nadeu
    Javier Hernando
    Josep R. Casas
    [J]. EURASIP Journal on Advances in Signal Processing, 2011
  • [24] Information Fusion for Combining Visual and Textual Image Retrieval in ImageCLEF@ICPR
    Zhou, Xin
    Depeursinge, Adrien
    Mueller, Henning
    [J]. RECOGNIZING PATTERNS IN SIGNALS, SPEECH, IMAGES, AND VIDEOS, 2010, 6388 : 129 - 137
  • [25] EmoMV: Affective music-video correspondence learning datasets for classification and retrieval
    Thao, Ha Thi Phuong
    Roig, Gemma
    Herremans, Dorien
    [J]. INFORMATION FUSION, 2023, 91 : 64 - 79
  • [26] AVaTER: Fusing Audio, Visual, and Textual Modalities Using Cross-Modal Attention for Emotion Recognition
    Das, Avishek
    Sarma, Moumita Sen
    Hoque, Mohammed Moshiul
    Siddique, Nazmul
    Dewan, M. Ali Akber
    [J]. SENSORS, 2024, 24 (18)
  • [27] Visual audio and textual triplet fusion network for multi-modal sentiment analysis
    Lv, Cai-Chao
    Zhang, Xuan
    Zhang, Hong-Bo
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024,
  • [28] Semantic analysis based on fusion of audio/visual features for soccer video
    Wang, Zengkai
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY, 2021, 183 : 563 - 571
  • [29] Attention-Based Audio-Visual Fusion for Video Summarization
    Fang, Yinghong
    Zhang, Junpeng
    Lu, Cewu
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2019), PT II, 2019, 11954 : 328 - 340
  • [30] Video scene retrieval with symbol sequence based on integrated audio and visual features
    Morisawa, K
    Nitta, N
    Babaguchi, N
    [J]. MULTIMEDIA CONTENT ANALYSIS, MANAGEMENT, AND RETRIEVAL 2006, 2006, 6073