Exploiting Evidential Theory in the Fusion of Textual, Audio, and Visual Modalities for Affective Music Video Retrieval

被引：0

作者：

Nemati, Shahla ^{[1
]}

Naghsh-Nilchi, Ahmad Reza ^{[2
]}

机构：

[1] Shahrekord Univ, Dept Comp Engn, Fac Engn, Shahrekord, Iran

[2] Univ Isfahan, Fac Comp Engn, Dept Artifital Intelligent, Esfahan, Iran

来源：

2017 3RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND IMAGE ANALYSIS (IPRIA) | 2017年

关键词：

Affective music video retrieval; Lexicon-based sentiment analysis; Information fusion; Emotion detection; SENTIMENT ANALYSIS; FRAMEWORK;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Developing techniques to retrieve video contents with regard to their impact on viewers' emotions is the main goal of affective video retrieval systems. Existing systems mainly apply a multimodal approach that fuses information from different modalities to specify the affect category. In this paper, the effect of exploiting two types of textual information to enrich the audio-visual content of music video is evaluated; subtitles or songs' lyrics and texts obtained from viewers' comments in video sharing websites. In order to specify the emotional content of texts, an unsupervised lexicon-based method is applied. This method does not need any human-coded corpus for training and is much faster than supervised approach. In order to integrate these modalities, a new information fusion method is proposed based on the Dempster-Shafer theory of evidence. Experiments are conducted on the video clips of DEAP dataset and their associated viewers' comments on YouTube. Results show that incorporating songs' lyrics with the audio-visual content has no positive effect on the retrieval performance, whereas exploiting viewers' comments significantly improves the affective retrieval system. This could be justified by the fact that viewers' affective responses depend not only on the video itself but also on its context.

引用

页码：222 / 228

页数：7

共 50 条

[21] A fusion scheme of visual and auditory modalities for event detection in sports video
Xu, M
Duan, LY
Xu, CS
Tian, Q
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING SIGNAL, PROCESSING EDUCATION, 2003, : 189 - 192
[22] Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities
Butko, Taras
Canton-Ferrer, Cristian
Segura, Carlos
Giro, Xavier
Nadeu, Climent
Hernando, Javier
Casas, Josep R.
[J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2011,
[23] Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities
Taras Butko
Cristian Canton-Ferrer
Carlos Segura
Xavier Giró
Climent Nadeu
Javier Hernando
Josep R. Casas
[J]. EURASIP Journal on Advances in Signal Processing, 2011
[24] Information Fusion for Combining Visual and Textual Image Retrieval in ImageCLEF@ICPR
Zhou, Xin
Depeursinge, Adrien
Mueller, Henning
[J]. RECOGNIZING PATTERNS IN SIGNALS, SPEECH, IMAGES, AND VIDEOS, 2010, 6388 : 129 - 137
[25] EmoMV: Affective music-video correspondence learning datasets for classification and retrieval
Thao, Ha Thi Phuong
Roig, Gemma
Herremans, Dorien
[J]. INFORMATION FUSION, 2023, 91 : 64 - 79
[26] AVaTER: Fusing Audio, Visual, and Textual Modalities Using Cross-Modal Attention for Emotion Recognition
Das, Avishek
Sarma, Moumita Sen
Hoque, Mohammed Moshiul
Siddique, Nazmul
Dewan, M. Ali Akber
[J]. SENSORS, 2024, 24 (18)
[27] Visual audio and textual triplet fusion network for multi-modal sentiment analysis
Lv, Cai-Chao
Zhang, Xuan
Zhang, Hong-Bo
[J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024,
[28] Semantic analysis based on fusion of audio/visual features for soccer video
Wang, Zengkai
[J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY, 2021, 183 : 563 - 571
[29] Attention-Based Audio-Visual Fusion for Video Summarization
Fang, Yinghong
Zhang, Junpeng
Lu, Cewu
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2019), PT II, 2019, 11954 : 328 - 340
[30] Video scene retrieval with symbol sequence based on integrated audio and visual features
Morisawa, K
Nitta, N
Babaguchi, N
[J]. MULTIMEDIA CONTENT ANALYSIS, MANAGEMENT, AND RETRIEVAL 2006, 2006, 6073

← 1 2 3 4 5 →