Cross-media retrieval via fusing multi-modality and multi-grained data

被引：0

作者：

Liu, Z. ^{[1
,2
]}

Yuan, S. ^{[1
,2
]}

Pei, X. ^{[1
,2
]}

Gao, S. ^{[1
,2
]}

Han, H. ^{[1
,2
]}

机构：

[1] Shandong Univ Finance & Econ, Sch Comp Sci & Technol, Jinan 250014, Shandong, Peoples R China

[2] Shandong Univ Finance & Econ, Shandong Prov Key Lab Digital Media Technol, Jinan 250014, Shandong, Peoples R China

来源：

SCIENTIA IRANICA | 2023年 / 30卷 / 05期

关键词：

Cross-media retrieval; Multi-modality data; Multi-grained data; Multi-margin triplet loss; Margin-set;

D O I：

10.24200/sci.2023.59834.6456

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Traditional cross-media retrieval methods mainly focus on coarse-grained data that reflect global characteristics while ignoring the fine-grained descriptions of local details. Meanwhile, traditional methods cannot accurately describe the correlations between the anchor and the irrelevant data. This paper aims to solve the abovementioned problems by proposing to fuse coarse-grained and fine-grained features and a multi-margin triplet loss based on a dual-framework. (1) Framework I: A multi-grained data fusion framework based on Deep Belief Network, and (2) Framework II: A multi-modality data fusion framework based on the multi-margin triplet loss function. In Framework I, the coarse-grained and fine-grained features fused by the joint Restricted Boltzmann Machine are input into Framework II. In Framework II, we innovatively propose the multi-margin triplet loss. The data, which belong to different modalities and semantic categories, are stepped away from the anchor in a multi-margin way. Experimental results show that the proposed method achieves better cross-media retrieval performance than other methods with different datasets. Furthermore, the ablation experiments verify that our proposed multi-grained fusion strategy and the multi-margin triplet loss function are effective. (c) 2023 Sharif University of Technology. All rights reserved.

引用

页码：1645 / 1669

页数：25

共 50 条

[31] A New Benchmark and Approach for Fine-grained Cross-media Retrieval
He, Xiangteng
Peng, Yuxin
Xie, Liu
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1740 - 1748
[32] SentiStory: multi-grained sentiment analysis and event summarization with crowdsourced social media data
Ouyang, Yi
Guo, Bin
Zhang, Jiafan
Yu, Zhiwen
Zhou, Xingshe
PERSONAL AND UBIQUITOUS COMPUTING, 2017, 21 (01) : 97 - 111
[33] Search for multi-modality data in digital libraries
Yang, J
Zhuang, YT
Li, Q
ADVANCES IN MUTLIMEDIA INFORMATION PROCESSING - PCM 2001, PROCEEDINGS, 2001, 2195 : 482 - 489
[34] LEARNING OPTIMAL DATA REPRESENTATION FOR CROSS-MEDIA RETRIEVAL
Zhang, Hong
Chen, Li
2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1925 - 1928
[35] Video-Text Retrieval by Supervised Sparse Multi-Grained Learning
Wang, Yimu
Shi, Peng
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 633 - 649
[36] Multi-modality Medical Case Retrieval Using Heterogeneous Information
Wu, Menglin
Sun, Quansen
INTELLIGENT COMPUTING IN BIOINFORMATICS, 2014, 8590 : 80 - 91
[37] Multi-grained encoding and joint embedding space fusion for video and text cross-modal retrieval
Xiaotao Cui
Jing Xiao
Yang Cao
Jia Zhu
Multimedia Tools and Applications, 2022, 81 : 34367 - 34386
[38] An Approach for Mining Heterogeneous Data for Cross-Media Retrieval
Pavan, K. Madhu
Ananthanarayana, V. S.
2013 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS AND NETWORKING TECHNOLOGIES (ICCCNT), 2013,
[39] Image retrieval++ - web image retrieval with an enhanced multi-modality ontology
Wang, Huan
Chia, Liang-Tien
Liu, Song
MULTIMEDIA TOOLS AND APPLICATIONS, 2008, 39 (02) : 189 - 215
[40] Multi-grained encoding and joint embedding space fusion for video and text cross-modal retrieval
Cui, Xiaotao
Xiao, Jing
Cao, Yang
Zhu, Jia
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34367 - 34386

← 1 2 3 4 5 →